[p2p-hackers] Hybrid Static/Dynamic DHT Systems

Matt Leslie mleslie at fnal.gov
Sat Apr 16 19:55:09 UTC 2005


The idea of having both mutable pointers to the latest version, and 
immutable versions of a data item is used in OceanStore 
(http://oceanstore.cs.berkeley.edu/publications/papers/pdf/fast2003-pond.pdf). 
The problem arises that revision lists in the 'dynamic hash table' 
should also be replicated for reliability, so you are going to have to 
pay the cost of maintaining consistent replicas at some level.

I'm currently looking at the costs and benefits of several replication 
schemes for DHTs, and I hope to be able to make the results public soon.

Matt


Cortland Klein wrote:

> This design proposal aims to describe a DHT system which has both a 
> functional mutable hash table and a replicating static content hash 
> table.
>
> The problem I see with traditional DHT systems is that they can not 
> easilly replicate data without making it much more difficult to 
> maintain up-to-date state. In addition, it becomes impossible to 
> provide an immutable reference to particular data.
>
> A Hybrid DHT System would have two subsystems, the traditional dynamic 
> mutable hash table and the added replicating static content hash table.
>
> The static content hash table is a simple table where the keys are the 
> SHA-1 of the content and the content is a length prefixed amount of 
> data limited to 256KB. When a key is requested, it's value is 
> automatically cached among the path of nodes to the node requesting 
> the data. Nodes maintain their own caches and replace old keys with 
> new keys as they pass through. Old keys are chosen based not only on 
> their last request time, but also how close the key is to the node's 
> ID. If the node is 'responsible' for a key, it remains in the cache 
> indefinitely. Thus, we can achieve replication while ensuring at least 
> one copy is always on the network. Data larger than 256KB is split up 
> into chunks and a splitfile containing a list of the static content 
> keys for each chunk is used to represent the data.
>
> The dynamic mutable hash table provides for each key a revisioned list 
> of pointers to static content keys. Thus, each dynamic key's content 
> has a table with datestamps and static content keys. To get the 
> current value for a dynamic key, the system then does a lookup for the 
> static content key. The node responsible for the key maintains this 
> revisioned list and updates it when a put is made on the key. It can 
> then expire old revisions after both a set number of revisions and at 
> least a certain amount of time have passed.
>
> My reasoning behind having two tables is that dynamic data cannot 
> easilly be replicated due to its mutability, however it can point to 
> static data which can be replicated. By having the static data retain 
> a 'responsible' node, we can ensure that at least one copy of the 
> static data is available.
>
> Any thoughts?
>




More information about the P2p-hackers mailing list