[p2p-hackers] Re: scalability
Ronald Wertlen
rrrw at neofonie.de
Thu Dec 1 20:48:45 UTC 2005
Hi Adam,
perhaps you have not understood my message because you have not noticed
the focus on "precision and recall" (i.e. search) not the old
Distributed DB vs. own DB debate. You have also pigeon-holed my email
with the DHT crowd (*grin*), it couldn't be further from it!
I was arguing in the other direction - which coderman thankfully picked
up. Gnutella doesn't structure enough, that's all. Sure Gnutella beats
DHTs on search - I base that observation on a project I finished last
year - a public prototype that used JXTA and was honed for search using
super-peers [DFN S2S http://s2s.neofonie.de/ (German site) - we've
moved on some since them ;) ].
Gnutella 0.6 (is there a 0.7 protocol, I can't find it?) allows
practically anyone to elevate to super-peer, which results in a random
(power-law distribtion) network. Such a network is not going to perform
very well as far as recall and precision are concerned, past a certain
point. I would be interested to calculate that exact point (but doubting
I'll get to it some time soon :-/).
HTH.
Best regards, Ron
PS. seems this thread has driven the original author to reformulate his
statment... :-)
PPS.
In fact, the network is not going to be completely random - it will
follow the contours of the internet (distribution of servers, broadband
connections, users, etc. is not random). I am not sure if that destroys
or supports my argument. Back to the drawing board!
We actually need a better internet. [oops there I go getting unspecific
again, sorry!! ;-) ]
> Message: 4
> Date: Wed, 30 Nov 2005 16:42:39 -0500
> From: Adam Fisk <afisk at speedymail.org>
> Subject: Re: [p2p-hackers] Re: scalability
> To: "Peer-to-peer development." <p2p-hackers at zgp.org>
> Message-ID: <438E1CCF.4010907 at speedymail.org>
> Content-Type: text/plain; charset=ISO-8859-1; format=flowed
>
> I don't understand your post. When you say "critical", I assume you're
> talking about life and death situations? Are you talking about anything
> specifically? DHTs have failure rates. Ad hoc and mesh networks can
> become useful in emergency situations where conventional infrastructures
> break down, but the centralized/p2p/structured/unstructured questions
> here are far from obvious.
>
> On the "obsessive science types" issue, this completely misses the
> point. It's a very non "obsessive science type" statement. There are
> strong reasons for using the massive indexing/random walk approach above
> DHTs -- reasons that have nothing to do with scalability. In
> particulary, DHTs are, well, hash tables. Hash tables don't work well
> for metadata queries. They do fine for keywords (hotspots are a
> problem, but they can be solved), but they aren't as nice a fit for
> metadata. RDF and DHTs are tough to squeeze together, for example. The
> massive indexing (mutual index caching to use Serguei's term)/random
> walk approach can get around these issues more easily. They are also
> not nearly as brittle as DHTs. Sure, DHTs repair themselves after node
> joins and leaves, but node transience generally has a much greater
> effect on DHTs than it does on massive indexing networks.
>
> I also think you're underestimating the efficiency of massive indexing
> and random walks. Sure, these networks don't scale logarithmically, but
> they do pretty darn well.
>
> I encourage everyone to stay specific with their posts.
>
> All the Best,
>
> Adam
More information about the P2p-hackers
mailing list