[p2p-hackers] References for using hashes as unique identifiers?

Zooko O'Whielacronx zooko at zooko.com
Mon Mar 8 12:57:22 UTC 2004

 Mark Miller wrote:
> Jim McCoy then took the idea to MojoNation, which is how the rest of the
> world became exposed to it.
> Of course, being an obvious idea, it may have been rediscovered many times, 
> so it wouldn't surprise me if there were other independent historical 
> sequences. If anyone knows any, I'd be curious to hear of them. Thanks.

Mojo Nation wasn't the only vector, and probably not most important one.  The
Freenet white paper in 2000 [1] was probably a very important vector.  citeseer
listed that paper as #9 in "Most cited source documents published in 2000 in
the CiteSeer database as of May 2003" [2].  Unfortunately that paper doesn't
cite any prior work on the concept of what they call "Content Hash Keys", nor
does it assert that the idea is novel in that paper.

Eric Hughes filed patent #6,122,372 in 1997 that included the idea of using a
hash of a message as an ID of that message.  I worked with Eric at his company
in the fall of 1999, and with Jim at Mojo Nation starting in 2000, but I recall
talking to Jim in the summer of 1999 and hearing about his scheme for a global,
decentralized, attack-resistant file system, which included the notion of using
secure hashes as IDs.

I think there are two related notions that we might be talking about here:
cryptographic hashes as secure, independently-computed unique identifiers, and
using independently-computed unique identifiers as locators.

If we're just talking about using a hash value as a unique identifier, then
that's pretty much what cryptographic hashes were invented for!  The
mathematical underpinings of the concept of a one-way hash function were first
explicated by Merkle [3] in 1979 (according to [4], section 9.2), and keyed
MACs was also discussed in the 70's (also according to [4]).

The common use of a secure hash in crypto is to provide a smaller string that
is used as a secure identifier of a larger string in order to compute public
key crypto on the smaller string.  ([4] says that this idea was published by
Rabin in 1978.)

The Chord File System paper (2001) [5] says "The use of content-hashes to
securely link together different pieces of data is due to Merkle." -- citing
Merkle's hash trees paper from 1987 [6].

If we're talking about using independently-computed unique identifiers as
locators, then I guess that's what hashes were invented for!  [4] tells me that
Knuth, pp.506-549, [7] traces the origins of hash functions back to IBM 1953.

So maybe what we're talking about is combining these two concepts, and
more-over applying them to distributed or even decentralized systems.  One
milestone in my opinion was Karger et al. [8], who provided the formalization
of the notion that if you use hashes to map to servers, and the set of servers
changes, then a "consistent hash" is one that minimizes the number of mappings
that have to be changed.  Then there is the paper by Plaxton et al., 1999 [9],
which is the grandfather of the crop of exciting "peer-to-peer" publications at
the beginning of the new century.  In [9], they simply say "Assign each object
an n-bit id, chosen uniformly at random".

So currently, if I were forced to choose a paper to cite as "the earliest
publication of the concept of using a secure hash as a locator", I would choose
the Freenet paper [1].



[1]  "Freenet: A Distributed Anonymous Information Storage and Retrieval
     System" Clarke, Sandberg, Wiley, Hong 2000

[2]  http://citeseer.nj.nec.com/source-2000.html

[3]  "Secrecy Authentication and Public Key Systems" UMI Research Press, Merkle

[4]  "Handbook of Applied Cryptography" Menezes, van Oorschot, Vanstone 1997

[5]  "Wide-area cooperative storage with CFS" Dabek, Kaashoek, Karger, Morris,
     Stoica ACM SOSP 2001

[6]  "A digital signature based on a conventional encryption function" Merkle
     Crypto '87 1987

[7]  "The Art of Computer Programming -- Sorting and Searching, vol. 3" Knuth

[8]  "Consistent Hashing and Random Trees" Karger, Lehman, Leighton, Levine,
     Lewin, Panigraphy 1997

[9]  "Accessing nearby copies of replicated objects in a distributed
     environment" Plaxton, Rajaraman, Richa. Theory of Computing Systems,
     32:241-280, 1999.

More information about the P2p-hackers mailing list