[p2p-hackers] Re: scalability

Ronald Wertlen rrrw at neofonie.de
Thu Dec 1 20:48:45 UTC 2005


Hi Adam,

perhaps you have not understood my message because you have not noticed 
the focus on "precision and recall" (i.e. search) not the old 
Distributed DB vs. own DB debate. You have also pigeon-holed my email 
with the DHT crowd (*grin*), it couldn't be further from it!

I was arguing in the other direction - which coderman thankfully picked 
up.  Gnutella doesn't structure enough, that's all. Sure Gnutella beats 
DHTs on search - I base that observation on a project I finished last 
year - a public prototype that used JXTA and was honed for search using 
super-peers   [DFN S2S http://s2s.neofonie.de/ (German site) - we've 
moved on some since them  ;) ].

Gnutella 0.6 (is there a 0.7 protocol, I can't find it?) allows 
practically anyone to elevate to super-peer, which results in a random 
(power-law distribtion) network. Such a network is not going to perform 
very well as far as recall and precision are concerned, past a certain 
point. I would be interested to calculate that exact point (but doubting 
I'll get to it some time soon :-/).

HTH.

Best regards, Ron

PS. seems this thread has driven the original author to reformulate his 
statment...  :-)

PPS.
In fact, the network is not going to be completely random - it will 
follow the contours of the internet (distribution of servers, broadband 
connections, users, etc. is not random). I am not sure if that destroys 
or supports my argument. Back to the drawing board!

We actually need a better internet. [oops there I go getting unspecific 
again, sorry!!  ;-) ]


> Message: 4
> Date: Wed, 30 Nov 2005 16:42:39 -0500
> From: Adam Fisk <afisk at speedymail.org>
> Subject: Re: [p2p-hackers] Re: scalability  
> To: "Peer-to-peer development." <p2p-hackers at zgp.org>
> Message-ID: <438E1CCF.4010907 at speedymail.org>
> Content-Type: text/plain; charset=ISO-8859-1; format=flowed
> 
> I don't understand your post.  When you say "critical", I assume you're 
> talking about life and death situations?  Are you talking about anything 
> specifically?  DHTs have failure rates.  Ad hoc and mesh networks can 
> become useful in emergency situations where conventional infrastructures 
> break down, but the centralized/p2p/structured/unstructured questions 
> here are far from obvious.
> 
> On the "obsessive science types" issue, this completely misses the 
> point.  It's a very non "obsessive science type" statement.  There are 
> strong reasons for using the massive indexing/random walk approach above 
> DHTs -- reasons that have nothing to do with scalability.  In 
> particulary, DHTs are, well, hash tables.  Hash tables don't work well 
> for metadata queries.  They do fine for keywords (hotspots are a 
> problem, but they can be solved), but they aren't as nice a fit for 
> metadata.  RDF and DHTs are tough to squeeze together, for example.  The 
> massive indexing (mutual index caching to use Serguei's term)/random 
> walk approach can get around these issues more easily.  They are also 
> not nearly as brittle as DHTs.  Sure, DHTs repair themselves after node 
> joins and leaves, but node transience generally has a much greater 
> effect on DHTs than it does on massive indexing networks.
> 
> I also think you're underestimating the efficiency of massive indexing 
> and random walks.  Sure, these networks don't scale logarithmically, but 
> they do pretty darn well. 
> 
> I encourage everyone to stay specific with their posts.
> 
> All the Best,
> 
> Adam





More information about the P2p-hackers mailing list