[p2p-hackers] How to solve the "Britney problem"?

Michael J. Freedman mfreed at cs.nyu.edu
Wed Dec 29 20:13:31 UTC 2004


Enzo,

This is exactly the problem for which "sloppy hashing" in Coral was 
designed.  It can perform this load balancing at the DHT layer (not 
through aggressive app-layer caching like Freenet, CFS, PAST, etc.)

   Short paper:
   http://www.scs.cs.nyu.edu/coral/docs/coral-iptps03.pdf

   Expanded paper with integrating CDN:
   http://www.scs.cs.nyu.edu/coral/docs/coral-nsdi04.pdf

The basic idea is that get() may retrieve only a subset of the values 
inserted under put ().  And in fact, due to Coral's clustering algorithms, 
these retrieved values are likely those inserted by nearby nodes.

We've used Coral primarly to build and deploy CoralCDN, a web content 
distribution network. Think that when a slashdotted page gets accessed via 
Coral, it disseminates to most/all web proxies within the CoralCDN 
network, yet even though all proxies are "announcing" themselves as 
caching the page, the most-loaded DHT node is only seeing at most log(n) 
puts ().  (We in fact quickly prove this, see the expanded paper.)

As an aside, Coral has seen its traffic quickly grow since we've announced 
"beta" release here back in late August.  (Just add ".nyud.net:8090" to 
the end of any hostname in a URL.)  It now gets 5-7 M requests per day 
from ~200,000 clients, serving several TB.  I guess this list serves as 
a fast-track method for slashdotting!

--mike


On Wed, 29 Dec 2004, Enzo Michelangeli wrote:

> Date: Wed, 29 Dec 2004 00:56:27 +0800
> From: Enzo Michelangeli <em at em.no-ip.com>
> Reply-To: Peer-to-peer development. <p2p-hackers at zgp.org>
> To: Peer-to-peer development. <p2p-hackers at zgp.org>
> Subject: [p2p-hackers] How to solve the "Britney problem"?
> 
> DHT's such as Kademlia work well when they are used to store large numbers
> of items with different keys. But let's suppose, for example, that a lot
> of nodes want to advertise their IP addresses as the place where people
> can find an instance of a very common resource, such as, say, an open
> proxy for some kind of protocol. All the records will then have the same
> key: Hash("Resource XYZ available here"), which means that the few unlucky
> DHT nodes with ID close to that hash will be requested to store the
> pointers for ALL the records. To some extent this already happens with
> filesharing applications: nodes with ID close to Hash("britney") are
> requested to store metadata information for all Britney Spear's MP3 files.
> However, those may amount, at most, to few hundred or maybe thousand
> records; but in the case I mentioned before the number of records may grow
> unbounded as more and more resource providers jpin the scheme. Are there
> techniques devised specifically to solve this problem?
>
> Enzo
>
> _______________________________________________
> p2p-hackers mailing list
> p2p-hackers at zgp.org
> http://zgp.org/mailman/listinfo/p2p-hackers
> _______________________________________________
> Here is a web page listing P2P Conferences:
> http://www.neurogrid.net/twiki/bin/view/Main/PeerToPeerConferences
>

-----
"Not all those who wander are lost."           www.michaelfreedman.org



More information about the P2p-hackers mailing list