[p2p-hackers] Re: scalability
Serguei Osokine
Serguei.Osokine at efi.com
Fri Dec 2 19:38:41 UTC 2005
On Friday, December 02, 2005 Greg Bildson wrote:
> I have a feeling that pre-centralized hostcache, the network was
> more of a long string with some clumps as it went along.
So what kept this string from fully clumping as the connections
were broken and reestablished? Default was four connections, not two.
How is it possible not to fold this string onto itself about one
thousand times after the first 1,000 connections will be reestablished
- which would take 10-15 minutes in a 1,000-node network, and would
happen instantly in a one-million one?
> If ToadNode was correct in that they had millions of downloads in
> those early days then thats the only way that I could see the modem
> bandwidth barrier not getting hit very quickly.
Between people not using the downloaded code, an error in
ToadNode stats, a miracle, and the network preserving its 'linear'
graph topology for any noticeable time, my vote will be for any one
of the first three - the last one is too improbable.
Best wishes -
S.Osokine.
2 Dec 2005.
-----Original Message-----
From: p2p-hackers-bounces at zgp.org [mailto:p2p-hackers-bounces at zgp.org]On
Behalf Of Greg Bildson
Sent: Friday, December 02, 2005 11:23 AM
To: Peer-to-peer development.
Subject: RE: [p2p-hackers] Re: scalability
The only locality that I can think of that may have occurred back in that
early timeframe would be based on the stringiness of the network. I have a
feeling that pre-centralized hostcache, the network was more of a long
string with some clumps as it went along. So, its possible that the network
diameter at its longest point was much larger than max-TTL. Then, the
introduction of centralized hostcaches helped create a massive cluster and
exacerbated the early modem bandwidth barrier. This appeared to be what
Gene Kan thought I believe. Its was only months later with the introduction
of clients with keepalive pings and flow control that the clogged spots got
freed up.
If ToadNode was correct in that they had millions of downloads in those
early days then thats the only way that I could see the modem bandwidth
barrier not getting hit very quickly.
Thanks
-greg
> -----Original Message-----
> From: p2p-hackers-bounces at zgp.org [mailto:p2p-hackers-bounces at zgp.org]On
> Behalf Of Serguei Osokine
> Sent: Friday, December 02, 2005 1:23 PM
> To: Peer-to-peer development.
> Subject: RE: [p2p-hackers] Re: scalability
>
>
> On Friday, December 02, 2005 Alexander Löser wrote:
> > originally there was a certain type of clustering in the beginnings
> > of Gnutella (late 90ies) . People communicate its ids mouth to mouth
> > or via Email or deja news to other people. So in most cases you got
> > Ids from people which had at least similar interests, or from
> > people where you expected some interesting files.
>
> I'm sorry to contradict you, but I think this is all a myth.
>
> First, there was no Gnutella in late 90ies. It was released in
> March of 2000. Second, I remember looking at the connection stability
> just a few months later (June/July, maybe?), and the churn was quite
> high - a client tended to replace all its connections within an hour
> or so.
>
> Now if you remember how the connections were replaced, the
> client was trying the IPs that it received from PONGs, which were
> essentially the random network IPs, because the network was just
> a few thousand nodes and every client could see the pongs from
> pretty much everyone. So in an hour or so your initial connection
> point stopped being relevant and you found yourself at a random
> place in the network. After that, all your subsequent sessions used
> the IP list stored on disk by a previous session to connect to the
> network, and the address given to you by your friends was no longer
> important.
>
> To be precise, this latest part (about the IP list) was the
> behaviour of the Gnutella clients that I worked with (I think these
> were Gnutella v.056 and GNUT). Maybe there were some clients that
> required to enter an IP at every session start. I don't know. There
> was also a notion of locality based on the unusually good and stable
> connections - as soon as the two machines on my desktop would find
> each other on the network as a result of this random process, they
> would stay connected for quite a while (as long as I did not stop
> the clients).
>
> But even these considerations are not important, because the
> early Gnutella (until the meltdown of July 2000) was fully visible,
> and every query more or less reached every node (in the absence of
> the flow control, this is exactly what caused the meltdown - TTL was
> too high to limit the query propagation).
>
> Of course, some queries might have been missing some nodes, but
> generally there was no chance for any clustering - I simply cannot see
> how it could possibly exist in such a network.
>
> > We (Berlin and Karlsruhe) developed a new protocol (INGA Interest
> > based Node Grouping Algorithm [1][2]) , that reclusters the network
> > based on the interests of the peers, without any DHT, only using on
> > an unstructured network.
>
> Which is cool, and maybe it is a great protocol - as long as
> you won't justify its existence by myths. I'm sure there are plenty
> of legitimate reasons that make this protocol useful ;-)
>
> Best wishes -
> S.Osokine.
> 2 Dec 2005.
>
>
> -----Original Message-----
> From: p2p-hackers-bounces at zgp.org [mailto:p2p-hackers-bounces at zgp.org]On
> Behalf Of Alexander Löser
> Sent: Friday, December 02, 2005 1:29 AM
> To: Peer-to-peer development.
> Subject: Re: [p2p-hackers] Re: scalability
>
>
> Hi Adam,
> originally there was a certain type of clustering in the beginnings of
> Gnutella (late 90ies) . People communicate its ids mouth to mouth or via
> Email or deja news to other people. So in most cases you got Ids from
> people which had at least similar interests, or from people where you
> expected some interesting files. Later, due to the overwhelming
> attractiveness of the gnutella application they introduced the gtk and
> other bootstrapping alternatives, given you a number of starting
> pointers. However, this starting points a chosen 'randomly', so there is
> no longer any clustering by interests.
>
> We (Berlin and Karlsruhe) developed a new protocol (INGA Interest based
> Node Grouping Algorithm [1][2]) , that reclusters the network based on
> the interests of the peers, without any DHT, only using on an
> unstructured network. Similar to freenet, the network topology evolves
> over a while to a so called small world topology, where people with
> similar interests are clustered together. In addition, to further speed
> up the clustering process, peers also keep in a local index structures
> other peers, that are 'HUBs' in the network, e.g. having a high in and
> out degree. Our experiments show, that we significantly outperform
> Gnutella style approaches in messages even in highly volatile networks.
>
> Best's Alex
>
> [1] Searching Dynamic Communities with Personal Indexes. Löser, Tempich
> et.al 3rd. International Semantic Web Conference, Galway. Springer 2005
> http://cis.cs.tu-berlin.de/~aloeser/publications/iswc2005.pdf
> [2] Remindin': Semantic query routing in peer-to-peer networks based on
> social metaphors. Tempich et.al. WWW 2004, New York. ACM 2004
> http://**www.aifb.uni-karlsruhe.de/
> Publikationen/showPublikation?publ_id=447
>
> Ronald Wertlen schrieb:
>
> > Hi Adam,
> >
> > perhaps you have not understood my message because you have not
> > noticed the focus on "precision and recall" (i.e. search) not the old
> > Distributed DB vs. own DB debate. You have also pigeon-holed my email
> > with the DHT crowd (*grin*), it couldn't be further from it!
> >
> > I was arguing in the other direction - which coderman thankfully
> > picked up. Gnutella doesn't structure enough, that's all. Sure
> > Gnutella beats DHTs on search - I base that observation on a project I
> > finished last year - a public prototype that used JXTA and was honed
> > for search using super-peers [DFN S2S http://s2s.neofonie.de/
> > (German site) - we've moved on some since them ;) ].
> >
> > Gnutella 0.6 (is there a 0.7 protocol, I can't find it?) allows
> > practically anyone to elevate to super-peer, which results in a random
> > (power-law distribtion) network. Such a network is not going to
> > perform very well as far as recall and precision are concerned, past a
> > certain point. I would be interested to calculate that exact point
> > (but doubting I'll get to it some time soon :-/).
> >
> > HTH.
> >
> > Best regards, Ron
> >
> > PS. seems this thread has driven the original author to reformulate
> > his statment... :-)
> >
> > PPS.
> > In fact, the network is not going to be completely random - it will
> > follow the contours of the internet (distribution of servers,
> > broadband connections, users, etc. is not random). I am not sure if
> > that destroys or supports my argument. Back to the drawing board!
> >
> > We actually need a better internet. [oops there I go getting
> > unspecific again, sorry!! ;-) ]
> >
> >
> >> Message: 4
> >> Date: Wed, 30 Nov 2005 16:42:39 -0500
> >> From: Adam Fisk <afisk at speedymail.org>
> >> Subject: Re: [p2p-hackers] Re: scalability To: "Peer-to-peer
> >> development." <p2p-hackers at zgp.org>
> >> Message-ID: <438E1CCF.4010907 at speedymail.org>
> >> Content-Type: text/plain; charset=ISO-8859-1; format=flowed
> >>
> >> I don't understand your post. When you say "critical", I assume
> >> you're talking about life and death situations? Are you talking
> >> about anything specifically? DHTs have failure rates. Ad hoc and
> >> mesh networks can become useful in emergency situations where
> >> conventional infrastructures break down, but the
> >> centralized/p2p/structured/unstructured questions here are far from
> >> obvious.
> >>
> >> On the "obsessive science types" issue, this completely misses the
> >> point. It's a very non "obsessive science type" statement. There
> >> are strong reasons for using the massive indexing/random walk
> >> approach above DHTs -- reasons that have nothing to do with
> >> scalability. In particulary, DHTs are, well, hash tables. Hash
> >> tables don't work well for metadata queries. They do fine for
> >> keywords (hotspots are a problem, but they can be solved), but they
> >> aren't as nice a fit for metadata. RDF and DHTs are tough to squeeze
> >> together, for example. The massive indexing (mutual index caching to
> >> use Serguei's term)/random walk approach can get around these issues
> >> more easily. They are also not nearly as brittle as DHTs. Sure,
> >> DHTs repair themselves after node joins and leaves, but node
> >> transience generally has a much greater effect on DHTs than it does
> >> on massive indexing networks.
> >>
> >> I also think you're underestimating the efficiency of massive
> >> indexing and random walks. Sure, these networks don't scale
> >> logarithmically, but they do pretty darn well.
> >> I encourage everyone to stay specific with their posts.
> >>
> >> All the Best,
> >>
> >> Adam
> >
> >
> >
> > _______________________________________________
> > p2p-hackers mailing list
> > p2p-hackers at zgp.org
> > http://zgp.org/mailman/listinfo/p2p-hackers
> > _______________________________________________
> > Here is a web page listing P2P Conferences:
> > http://www.neurogrid.net/twiki/bin/view/Main/PeerToPeerConferences
> >
>
>
> --
> ___________________________________________________________
>
> Dr. Alexander Löser,
> Technische Universität Berlin,
> CIS, Sekr. EN 7, Einsteinufer 17, 10587 Berlin, GERMANY
> office: +49- 30-314-25556 fax: +49- 30-314-21601
> web: http://cis.cs.tu-berlin.de/~aloeser/
> ___________________________________________________________
>
> _______________________________________________
> p2p-hackers mailing list
> p2p-hackers at zgp.org
> http://zgp.org/mailman/listinfo/p2p-hackers
> _______________________________________________
> Here is a web page listing P2P Conferences:
> http://www.neurogrid.net/twiki/bin/view/Main/PeerToPeerConferences
> _______________________________________________
> p2p-hackers mailing list
> p2p-hackers at zgp.org
> http://zgp.org/mailman/listinfo/p2p-hackers
> _______________________________________________
> Here is a web page listing P2P Conferences:
> http://www.neurogrid.net/twiki/bin/view/Main/PeerToPeerConferences
_______________________________________________
p2p-hackers mailing list
p2p-hackers at zgp.org
http://zgp.org/mailman/listinfo/p2p-hackers
_______________________________________________
Here is a web page listing P2P Conferences:
http://www.neurogrid.net/twiki/bin/view/Main/PeerToPeerConferences
More information about the P2p-hackers
mailing list