[p2p-hackers] proposal re: scalability of block publication and fetching
zooko at zooko.com
zooko at zooko.com
Wed Feb 28 11:42:02 UTC 2001
I was mildly stung by Oskar's assertion that Mojo Nation architecture
is inherently non-scalable with respect to fetching a block whose id
you know. (Which Freenet people apparently call "routing", I guess
because they are thinking of app-level communication routing.)
I am thinking about a change to the MN architecture which would be very
very simple to deploy (for you Mojo Hackers, it would be simply a new
handicapper plug-in). I will describe it in abstract terms, glossing
over at least four implementation details that you don't need to know
in order to tell me if this is scalable or not.
Suppose that you have a network with nodes which hold blocks of data
indexed by the SHA1 hash of the block.
A node, `A', wants to publish a block of data and then later a
different node `B', who already knows the unique id of that block wants
to fetch the block
`A' knows the "phonebook info" for N other nodes, where the "phonebook
info" consists of the public key and other information sufficient to
communicate with that node.
My proposal, which I call "MaskMatchingHandicapper", is that `A'
chooses the log(N) nodes whose public key ids have the highest "mask
match" with the id of the block. A "mask match" is currently the
number of contigious leading bits which all match, but any scalar
comparison like Hamming distance or integer difference should behave as
well.
`A' then sends the block to those log(N) nodes.
Now later `B' wishes to fetch the block, whose id `B' already knows.
`B' already knows M other nodes. `B' queries the top log(M) nodes
based on mask match.
(Note that you can consider the fact that `A' and `B' know possibly
different sets of counterparties to be either a consequence of one or
both of them having an incomplete knowledge of the net, or of `B'
operating at a later time than `A', in which case some nodes will have
come and gone.)
Now what is the chance of success? It is the chance that at least one
of the log(N) nodes published-to by `A' are also among the log(M) nodes
queried by `B'. The key phrase in there is "at least one", and given
some weak assumptions about the chance of an arbitrary node being
reachable by one of the counterparties, we can easily gain a high
confidence that at least one will be reachable by both simply by
changing our "log(X)" to "K * log(X)" for some small constant K. (This
has implications for the bandwidth usage and the performance, which is
one of those issues that I'm glossing over, although we do have a
solution to this already implemented and deployed in Mojo Nation.
Actually four solutions, two of which we are grandfathering-out at this
point... ;-))
There are (at least) two particular ways of failure:
1. All log(N) of the nodes that `A' published to are unknown to `B'.
2. All log(M) of the nodes queried by `B' were unknown to `A'.
Also combinations of the two.
Note that the latter case (#2) could happen if the size of the network
had ballooned dramatically between `A' publishing and `B' querying.
But the network would have to actually _square_ in size before _all_ of
the log(M) nodes queried by `B' were newbies.
Now is this scalable? It seems obvious to me that it is, although it
leaves open questions of practical performance (you do not want to
publish log(N) times the size of your data), and the big question of
how `B' learned the unique id of the block. (Both of these issues are
already solved, of course, on Mojo Nation, but those solutions might
not scale.)
Regards,
Zooko
P.S. The idea of MaskMatchingHandicapper was inspired by an idea that
Raph Levien posted to advogato concerning doing the same thing, but for
plaintext meta-data instead of for blocks.
More information about the P2p-hackers
mailing list