[p2p-hackers] Node counting algorithm

Aaron Harwood aharwood at cs.mu.oz.au
Sun Mar 13 11:23:17 UTC 2005


On 24/02/2005, at 7:37 PM, Gwenchlan wrote:

> Le mercredi 23 février 2005 à 21:32 -0800, Sean C. Rhea a écrit :
>
>> On Feb 23, 2005, at 10:27 AM, Michael Parker wrote:
>>> If your leaf set overlaps, it's just the number of entries in your 
>>> leaf set.
>>> If your leaf set does not overlap, divide the size of the ring (e.g. 
>>> 2^128, 2^160) by the span of your leaf set (i.e., the farthest 
>>> clockwise node minus the farthest counterclockwise node, modulo the 
>>> ring size), and multiply by the size of your leaf set. Basically, 
>>> what this means is if your leaf set is size L, and it spans a 
>>> percentage x of the node identifier space, the size of the network 
>>> is approximately L * x^-1. To improve accuracy, ask the two farthest 
>>> nodes in your leaf set and ask them for their leaf sets, merging 
>>> them into yours before calculating. That way, you have a larger 
>>> effective L.
>>
>> This technique gives estimates that, on average, overestimate the 
>> size of the network if you pick node identifiers uniformly at random 
>> (UAR).  The reason is that UAR doesn't mean evenly distributed; some 
>> nodes' leaf sets cover much more than others.  If you have one node 
>> whose leaf set covers a larger portion of the key space, that node 
>> underestimates the size of the ring, but a lot of other nodes end up 
>> covering less of the key space (to make room for the larger one) and 
>> end up overestimating the network size.  When you average them all, 
>> the few nodes that underestimate don't make up for all the rest that 
>> overestimate it.
>>
>> IANAM (I am not a mathematician), but this is what happens when you 
>> simulate it at least, with identifiers drawn using SHA.  References 
>> for more mathematical explanations and a better algorithm are 
>> described in this paper:
>>
>>   http://iptps05.cs.cornell.edu/PDFs/CameraReady_174.pdf
>>
>> Sean
>>
> Thanks for these DHT heuristics. I have omited to specify i was 
> looking for tricks applying to unstructured networks (probably random 
> graphs).
> Thus we cannot deal with density here. I was thinking about using a 
> flexible method like randow walk, but K random walkers launched by 
> initiator raise the problem of scalability and accuracy as the overlay 
> may (more or less) vary during the counting itself, phenomenon 
> amplified by network size...

Perhaps you could try to model the neighborhood growth of the network 
in question and
then, with a sample of the neighborhood growth locally, you can 
extrapolate the process and
determine the total number of nodes. This method assumes that the 
neighborhood growth
characteristics (as a topological property) remains constant over a 
relatively long period of time
(despite relatively large changes in the network size).

--Aaron




More information about the P2p-hackers mailing list