[p2p-hackers] clustering

Daniel Stutzbach agthorr at cs.uoregon.edu
Wed Mar 8 17:15:36 UTC 2006

On Thu, Mar 09, 2006 at 12:38:58AM +0800, Ranus wrote:
> About your previous mail: I checked the definition of clustering
> <http://en.wikipedia.org/wiki/Clustering_coefficient> coefficient, that of a
> chord vertex is 3/(n-2) (for a 2^n node ring), and it's the same for every
> vertex, so the overall clustering coefficient is not high when n is
> moderately large.

That's actually a very larger clustering coefficient, compared to
random graph with the same number of nodes and edges.

An Erdos-Renyi random graph (the standard random graph) every edge
exists with probability |E| / |V|^2.  Consequently, the clustering
coefficient is approximately |E| / |V|^2.  In Chord, |E| = |V|*lg |V|, so
we have a clustering coefficient in the comparable random graph of 
lg |V| / |V|.

In your calculation, you're using n=2^|V|, so the clustering
coefficient is 3/(lg |V| - 2).  This is an enormous clustering
coefficient.  For |V| = 1 million, the clustering coefficient is 17%!
For a comparable random graph, it's only 0.002%.

(I use "lg x" to mean "log base 2 of x")

> The scale-free networks seem quite appealing as performance and scalability
> are concerned, and it's simpler to guarantee short paths with powerful
> supernodes. 

Scale-free networks scale well in some ways and scale very badly in
other ways.  Yes, they maintain short path lengths as the network gets
bigger.  However, they require that you have these increasingly
high-degree peers as the network gets bigger.  

You may have some peers that have more capacity than others, and it
may make sense to utilize those resources, but it does not seem 
wise to assume that if your network grows by a factor of 100 that you
will be able to a user who has 100 times more resources than your
previous best-user.

> Many P2P applications such as Gia (Making Gnutella-like P2P
> Systems Scalable, Sigcomm'03) introduce such supernodes to improve the
> scalability. But it seems harder, to build scalable networks with more or
> less the same number of edges for each node (this could be realistic when
> edges are interpreted as connections, rather than links/routing
> information), because at such time the supernodes are no where to be found.

It largely depends on what you're trying to achieve.  For a network
like Gnutella, global visibility is not one of the design
requirements, so it's OK to use a constant number of edges.

DHTs (typically) use O(log |V|) edges to guarantee a worst-case of
O(log |V|) steps per lookup.  That seems like a good compromise to me.

Daniel Stutzbach                           Computer Science Ph.D Student
http://www.barsoom.org/~agthorr                     University of Oregon

More information about the P2p-hackers mailing list