[p2p-hackers] Errata in the Chord paper ?
Bryan Turner
bryan.turner at pobox.com
Mon Sep 26 17:39:02 UTC 2005
Kazuyuki,
I have implemented a (slightly modified) version of Chord, here is
how I solved the problems you mention:
> A finger table entry (finger[k].node) can be the node itself which
> have the finger table. But, the pseudocode in Fig. 6 does not allow it.
I chose not to allow a node in its own finger table. Early testing
was finding routing loops in the network, and this was exacerbating them,
causing the TTL to eventually drop the packet.
I restructured the code to always check the local cache before
forwarding the packets. This fixed the routing loops, and improved some
search paths, but added extra burden to the CPU load. The correct fix for
this is to not use the Chord protocol as specified (see discussion below).
My initialization scheme was much simpler than Chord, I contact the
successor node, steal his routing table, and allow the stabilization
protocol to fix it over time.
> A joining node calls find_predecessor(n - 2^(i-1)) in a procedure
> update_finger_others() to find out nodes which should have the joining
> node in their finger tables. The argument for the call is not correct
> and it should be find_predecessor(n - 2^(i-1) + 1).
I believe you are correct. I did not implement this part of the
update procedure (see discussion below). In practice, this would never be
noticed. The chance of two nodes being immediately adjacent in the ring are
exceptionally small (1 in 2^160 if using SHA-1). Also, nodes with adjacent
IDs would cause one arc of the ring to be effectively useless (the
predecessor would only be able to store one ID!!).
It is more valuable to use their "virtual node" technique to select
several points on the ring to balance the burden more evenly among the
nodes. This also effectively minimizes the chance of two nodes being
adjacent.
Problems with Chord
-------------------
The Chord protocol (as specified in the referenced paper) has a few
issues which make it a poor choice for DHT implementations in the real
world:
- Too Rigid
Chord specifies that you must maintain exact successor/predecessor
and finger tables at all times. This requirement causes a significant
amount of network overhead during network flapping events, and at times of
the day when large swings in population occur. Often, this traffic is
enough to destabilize the network and cause routing to fail.
Chord is one of many algorithms to build a Small World network. It
has been shown that perfectly aligned finger tables are not necessary (as
some of their later papers mention). Instead, keep tabs on your local
neighborhood and keep "some" long-range pointers for efficient routing.
Specifically, your long range pointers should connect to well
established, well connected, well trusted nodes at various distant locations
on the ring.
- Inefficient use of connections
Most implementations in the real world will be using TCP. Chord
does not make use of the two-way connections. This becomes a significant
overhead on the operating system. Kademlia suggests (and I concur) that
Chord should use both forward and backward fingers, routing using a greedy
strategy over all incoming and outgoing connections. (see below for my rant
on greedy routing)
- "Unstable"
New nodes are considered immediately more valuable in their local
region than established, existing nodes. This quickly leads to
destabilization in the network when a network flap or node migration occurs.
Kademlia suggests (and again, I concur) that established nodes should be the
preferred routing fabric, and only once you have reached a local
neighborhood should the find-closest-predecessor method begin.
One metric for "establishment" is up-time. This is as good as any,
but you cannot rely on a node to give you this information, instead keep
track of when you first notice a node in the network and use this as your
up-time metric. Thus, you route to the closest node in that region with the
highest up-time.
- Route Scheduling
It is a common mistake with (as far as I am aware) all existing DHT
implementations to expect greedy routing to work most of the time. If I can
offer one take home message it is this: Use probabilistic routing! Make a
short list of good candidates, weigh them by their up time, number of
neighbors, and/or some trust metric, then probabilistically choose one.
Your network and your users will thank you.
All major enterprise applications (Load balancing of web servers,
DNS, Jini, Web Sphere, etc..) have probabilistic redirectors. This ensures
your network continues functioning even during network migrations and
flapping.
Hope this helps,
--Bryan
bryan.turner at pobox.com
-----Original Message-----
From: p2p-hackers-bounces at zgp.org [mailto:p2p-hackers-bounces at zgp.org] On
Behalf Of shudo at computer.org
Sent: Sunday, September 25, 2005 9:36 AM
To: p2p-hackers at zgp.org
Subject: [p2p-hackers] Errata in the Chord paper ?
Hi all,
I have some doubt about a pseudocode in the Chord paper:
I. Stoica, R. Morris, D. Karger, M. F. Kaashoek and H. Balakrishnan,
Chord: A Scalable Peer-to-peer Lookup Protocol for Internet Applications.
In Proc. ACM SIGCOMM 2001, pp.149-160, Aug. 2001.
It is a pseudocode for a basic node join operation in the Fig. 6. The paper
presents another join algorithm in the Fig. 7 and the latter one will be
implemented usually because the basic algorithm in Fig. 6 is not tolerate
concurrent joins (and failures).
So, it's not a serious problem at all if there are errata in the basic
algorithm. And, of course, my interpretation of the paper is not correct.
Please give me a comment if you know the same possible errata have been
pointed out somewhere before.
(1) A finger table entry (finger[k].node) can be the node itself which
have the finger table. But, the pseudocode in Fig. 6 does not allow it.
A finger table is initialized by a procedure init_finger_table(). The
init_finger_table() calls find_successor() on an existing node n'.
When the init_finger_table() is called on a joining node, no existing node
knows the joining node because existing nodes can notice the joining node
only by update_others() called just after init_finger_table().
As a result, a joining node cannot have itself in its finger table, even if
the joining node itself is the most appropriate node for a finger table
entry.
(2) A joining node calls find_predecessor(n - 2^(i-1)) in a procedure
update_finger_others() to find out nodes which should have the joining
node in their finger tables. The argument for the call is not correct
and it should be find_predecessor(n - 2^(i-1) + 1).
Let's consider the case of i = 1. Suppose n is the node ID of the joining
node. In this case, the joining node should find out a node whose ID is n-1
(of course, mod 2^m) or prior to n-1. But find_predecessor(n - 2^(i-1))
results in n-2 or prior to n-2 because a procedure find_predecessor(id) in
Fig. 4 returns id-1 or prior to id-1.
I appreciate any comment.
Cheers,
Kazuyuki Shudo
Grid Technology Research Center
National Institute of Advanced Industrial Science and Technology (AIST)
More information about the P2p-hackers
mailing list