[p2p-hackers] Simple lightweight DHT

Bryan Turner bryan.turner at pobox.com
Fri Jan 14 02:31:37 UTC 2005


    I've implemented four DHTs to date (Chord, Kademlia,
Continuous-Discrete, and a proprietary one).  I'm not sure there's much of a
'story' to it, but here are some excerpts of my trials..

Chord [1]

    Conceptually, Chord is very simple and that is one of the factors
motivating our decision to build it.  We had a basic Chord running between
three nodes in about a month.  This included a 256-bit keyspace (SHA-256),
recursive lookup, standard finger & successor tables, and a very simplistic
stabilize protocol.

    After scaling to a grid of 72 machines, we had some difficulty
maintaining the stability of the ring.  Our node loss event was updating
only the predecessor and successor (to save broadcasting to all fingers,
etc).  This, and the basic stabilization protocol (trade successor lists
every 10 sec with predecessor) did not keep the ring stable.  We adjusted it
to broadcast the node loss and fixed the problem - I still hold that this
should not be necessary for stability.

    We also had routing problems during churn.  It seems that some nodes
would 'skip over' the destination node, sending the lookup further around
the ring than necessary.  This would cause lookups to run around the ring
multiple times before landing at the right spot.  I think we finally tracked
this down to new nodes joining, and somewhere in the process of updating the
finger tables there was a period where routes were allowed to jump too far
in the ring.  We eventually fixed this too.

Kademlia [2]

    By this time I was recognizing some of the same problems that the
Kademlia group had noted with Chord; non bidirectional links,
uni-directional routing, static stabilization rate, and a host of problems
if you threw a knife switch between two halves of the network.

    The switchover to Kademlia from Chord was almost trivial, about two
weeks of work.  All of the structures are essentially the same, requiring
only an extension of the finger table to hold multiple entries and
timestamps.  Lookup, routing, etc.. was already written to be flexible so
our metric was the only thing that changed.  We did not implement the alpha
parameter, nor any of the system-level features like 24-hour lifespans, etc.
These did not fit the project.

    Kademlia introduced no additional problems, although it was much more
difficult to explain how data got grouped together & replicated when
tech-savvy customers started probing.  Kademlia worked fine for quite some
time, and it was the protocol for which our UDP stack was built (see a
previous posts for that design).

Voronoi Diagrams [3,4,5]

    At some point, I realized Kademlia still had not solved the knife-switch
problem.  This is very important to our project, as the installations are
known to be geographically diverse and the WAN connections to remote sites
are flaky at best.  Also, there are large differences in node's
computational and memory resources which were not being considered.

    After reading all the papers under the Continuous-Discrete chain
[3,4,5], I must say it blew my mind.  Unfortunately we were not in a
position to use or implement all of their ideas.  Also, these papers are
very math heavy and took awhile to digest.

    The switch from Kademlia to 1D Continuous-Discrete is essentially the
same as from Chord to Kademlia.  The protocols are almost exactly the same,
but Continuous-Discrete is significantly more flexible.  For instance, there
is no restriction to 2^k link table, the algorithm works just fine with 2,
or any number, or a different number per node, so long as you always
maintain the neighbor links.  Fault-tolerance is a trivial extension to the
protocol where nodes 'overlap' each other in the ID space, and this can be
an arbitrary overlap unlike Kademlia/Chord.  Kademlia's alpha parameter is
also mapped to an arbitrarily wide path through the ID space.  It is also
stable without broadcasting node loss events using a simple suspicion event
(lookups are not routed to suspected nodes).

    Basically take any fixed, rigid features of Chord/Kademlia and make them
flexible, you get Continuous-Discrete.

    I've been happy with this design for about a year, generalizing some
parts of the system and finally fixing the knife-switch problem using a
proprietary protocol built into the now-upgraded 2D Voronoi DHT system.

    Since our application only uses the DHT as a session set-up, we are not
too worried about the lookup times and stabilization times, as long as each
lookup does eventually succeed (or properly notify with a failure).  To
date, our system has been tested with 50 nodes on a dedicated grid at full
speed, saturating 100 Mbps links for over 36 hours.  This was a fairly small
test, with only 50 keys per node, to test collision and caching.  Total
development time has been about 2 years, although less than 6 months of that
is in tuning/hacking the DHT protocols.

bryan.turner at pobox.com

[1] http://www.pdos.lcs.mit.edu/papers/chord:sigcomm01/chord_sigcomm.pdf
[2] http://citeseer.ist.psu.edu/529075.html
[3] http://www.wisdom.weizmann.ac.il/~naor/PAPERS/dh.pdf
[4] http://iptps03.cs.berkeley.edu/final-papers/simple_fault_tolerant.pdf
[5] http://citeseer.ist.psu.edu/562059.html

----- Original Message ----- 
From: "Michael Parker" <mgp at ucla.edu>
To: "Peer-to-peer development." <p2p-hackers at zgp.org>
Sent: Tuesday, January 11, 2005 12:32 PM
Subject: Re: [p2p-hackers] Simple lightweight DHT

> I was about to ask a similar question... You're a seasoned peer-to-peer
> developer -- for those of us who are developers just starting out, and
> perhaps trying to invent our own new, novel topologies and systems, what
> are the hardest things to 'get right' in a peer-to-peer system?
> I've read the paper "Designing a DHT for Low Latency and High
> Throughput" [1] by the Chord group at MIT. It seems to sum up pretty
> well what their challenges were, and what practices they found were best.
> I know there are some other people out on this mailing list who could
> answer this too (Clarke, Freedman... if you feel that your name should
> be on this list, just respond). Sorry to try and drag you into the
> spotlight, but you definitely have a captive audience ;)
> - Michael Parker
> [1] http://citeseer.ist.psu.edu/dabek04designing.html
> Zooko O'Whielacronx wrote:
> > On 2005, Jan 10, at 20:00, Sean C. Rhea wrote:
> >
> >> I wrote the Bamboo router and got it working in about a week,
> >> although I had the experience and code base from writing Tapestry
> >> (another DHT) before that, so I had a bit of a head start.
> >>
> >> It took me another year to get Bamboo to perform as well as it does
> >> today.  I believe the Chord people had a similar experience.
> >
> >
> > What a fascinating story!  What things did you have to learn and
> > invent during the course of that year to improve the performance of
> > Bamboo?
> >
> > Regards,
> >
> > Zooko
> >
> > _______________________________________________
> > p2p-hackers mailing list
> > p2p-hackers at zgp.org
> > http://zgp.org/mailman/listinfo/p2p-hackers
> > _______________________________________________
> > Here is a web page listing P2P Conferences:
> > http://www.neurogrid.net/twiki/bin/view/Main/PeerToPeerConferences
> >
> _______________________________________________
> p2p-hackers mailing list
> p2p-hackers at zgp.org
> http://zgp.org/mailman/listinfo/p2p-hackers
> _______________________________________________
> Here is a web page listing P2P Conferences:
> http://www.neurogrid.net/twiki/bin/view/Main/PeerToPeerConferences

More information about the P2p-hackers mailing list