[p2p-hackers] Node counting algorithm
Aaron Harwood
aharwood at cs.mu.oz.au
Sun Mar 13 11:23:17 UTC 2005
On 24/02/2005, at 7:37 PM, Gwenchlan wrote:
> Le mercredi 23 février 2005 à 21:32 -0800, Sean C. Rhea a écrit :
>
>> On Feb 23, 2005, at 10:27 AM, Michael Parker wrote:
>>> If your leaf set overlaps, it's just the number of entries in your
>>> leaf set.
>>> If your leaf set does not overlap, divide the size of the ring (e.g.
>>> 2^128, 2^160) by the span of your leaf set (i.e., the farthest
>>> clockwise node minus the farthest counterclockwise node, modulo the
>>> ring size), and multiply by the size of your leaf set. Basically,
>>> what this means is if your leaf set is size L, and it spans a
>>> percentage x of the node identifier space, the size of the network
>>> is approximately L * x^-1. To improve accuracy, ask the two farthest
>>> nodes in your leaf set and ask them for their leaf sets, merging
>>> them into yours before calculating. That way, you have a larger
>>> effective L.
>>
>> This technique gives estimates that, on average, overestimate the
>> size of the network if you pick node identifiers uniformly at random
>> (UAR). The reason is that UAR doesn't mean evenly distributed; some
>> nodes' leaf sets cover much more than others. If you have one node
>> whose leaf set covers a larger portion of the key space, that node
>> underestimates the size of the ring, but a lot of other nodes end up
>> covering less of the key space (to make room for the larger one) and
>> end up overestimating the network size. When you average them all,
>> the few nodes that underestimate don't make up for all the rest that
>> overestimate it.
>>
>> IANAM (I am not a mathematician), but this is what happens when you
>> simulate it at least, with identifiers drawn using SHA. References
>> for more mathematical explanations and a better algorithm are
>> described in this paper:
>>
>> http://iptps05.cs.cornell.edu/PDFs/CameraReady_174.pdf
>>
>> Sean
>>
> Thanks for these DHT heuristics. I have omited to specify i was
> looking for tricks applying to unstructured networks (probably random
> graphs).
> Thus we cannot deal with density here. I was thinking about using a
> flexible method like randow walk, but K random walkers launched by
> initiator raise the problem of scalability and accuracy as the overlay
> may (more or less) vary during the counting itself, phenomenon
> amplified by network size...
Perhaps you could try to model the neighborhood growth of the network
in question and
then, with a sample of the neighborhood growth locally, you can
extrapolate the process and
determine the total number of nodes. This method assumes that the
neighborhood growth
characteristics (as a topological property) remains constant over a
relatively long period of time
(despite relatively large changes in the network size).
--Aaron
More information about the P2p-hackers
mailing list