[p2p-hackers] Re: scalability

Adam Fisk afisk at speedymail.org
Thu Dec 1 21:09:22 UTC 2005


Hi Ron-

Apologies for the DHT pigeon-holing.  I had this nagging feeling in my 
stomach that you may come more from the land of small world and power 
law networks, but I successfully supressed it!  I agree with Daniel that 
Gnutella's not actually a power law network, although I can't remember 
what led me to decide that (several years ago now).   If I recall 
correctly, it's that degrees between nodes are quite fixed and uniform. 

How would you prefer superpeers get elected?  Superpeer election on 
Gnutella is fairly simple primarily because there's a scarcity of 
non-firewalled/NATted machines to fill their roles, so you have to sort 
of take what you can get.  Are you referring more to which superpeers to 
*select* over the course of a search and not the original choice of 
superpeers?

On the Gnutella 0.6/0.7 issue, that's really just the version of the 
specification for connection headers -- a frequent source of confusion.  
Gnutella has rightfully evolved into a family of protocols that 
themselves have version numbers -- everything from superpeers to dynamic 
querying to bloom filter exchange and mesh downloading.  All of these 
evolve largely independently from one another, giving the protocol 
family much more flexibility and agility.

All the Best,

Adam


Ronald Wertlen wrote:

> Hi Adam,
>
> perhaps you have not understood my message because you have not 
> noticed the focus on "precision and recall" (i.e. search) not the old 
> Distributed DB vs. own DB debate. You have also pigeon-holed my email 
> with the DHT crowd (*grin*), it couldn't be further from it!
>
> I was arguing in the other direction - which coderman thankfully 
> picked up.  Gnutella doesn't structure enough, that's all. Sure 
> Gnutella beats DHTs on search - I base that observation on a project I 
> finished last year - a public prototype that used JXTA and was honed 
> for search using super-peers   [DFN S2S http://s2s.neofonie.de/ 
> (German site) - we've moved on some since them  ;) ].
>
> Gnutella 0.6 (is there a 0.7 protocol, I can't find it?) allows 
> practically anyone to elevate to super-peer, which results in a random 
> (power-law distribtion) network. Such a network is not going to 
> perform very well as far as recall and precision are concerned, past a 
> certain point. I would be interested to calculate that exact point 
> (but doubting I'll get to it some time soon :-/).
>
> HTH.
>
> Best regards, Ron
>
> PS. seems this thread has driven the original author to reformulate 
> his statment...  :-)
>
> PPS.
> In fact, the network is not going to be completely random - it will 
> follow the contours of the internet (distribution of servers, 
> broadband connections, users, etc. is not random). I am not sure if 
> that destroys or supports my argument. Back to the drawing board!
>
> We actually need a better internet. [oops there I go getting 
> unspecific again, sorry!!  ;-) ]
>
>
>> Message: 4
>> Date: Wed, 30 Nov 2005 16:42:39 -0500
>> From: Adam Fisk <afisk at speedymail.org>
>> Subject: Re: [p2p-hackers] Re: scalability  To: "Peer-to-peer 
>> development." <p2p-hackers at zgp.org>
>> Message-ID: <438E1CCF.4010907 at speedymail.org>
>> Content-Type: text/plain; charset=ISO-8859-1; format=flowed
>>
>> I don't understand your post.  When you say "critical", I assume 
>> you're talking about life and death situations?  Are you talking 
>> about anything specifically?  DHTs have failure rates.  Ad hoc and 
>> mesh networks can become useful in emergency situations where 
>> conventional infrastructures break down, but the 
>> centralized/p2p/structured/unstructured questions here are far from 
>> obvious.
>>
>> On the "obsessive science types" issue, this completely misses the 
>> point.  It's a very non "obsessive science type" statement.  There 
>> are strong reasons for using the massive indexing/random walk 
>> approach above DHTs -- reasons that have nothing to do with 
>> scalability.  In particulary, DHTs are, well, hash tables.  Hash 
>> tables don't work well for metadata queries.  They do fine for 
>> keywords (hotspots are a problem, but they can be solved), but they 
>> aren't as nice a fit for metadata.  RDF and DHTs are tough to squeeze 
>> together, for example.  The massive indexing (mutual index caching to 
>> use Serguei's term)/random walk approach can get around these issues 
>> more easily.  They are also not nearly as brittle as DHTs.  Sure, 
>> DHTs repair themselves after node joins and leaves, but node 
>> transience generally has a much greater effect on DHTs than it does 
>> on massive indexing networks.
>>
>> I also think you're underestimating the efficiency of massive 
>> indexing and random walks.  Sure, these networks don't scale 
>> logarithmically, but they do pretty darn well.
>> I encourage everyone to stay specific with their posts.
>>
>> All the Best,
>>
>> Adam
>
>
>
> _______________________________________________
> p2p-hackers mailing list
> p2p-hackers at zgp.org
> http://zgp.org/mailman/listinfo/p2p-hackers
> _______________________________________________
> Here is a web page listing P2P Conferences:
> http://www.neurogrid.net/twiki/bin/view/Main/PeerToPeerConferences
>



More information about the P2p-hackers mailing list