[p2p-hackers] How to solve the "Britney problem"?

Vaste mllist at vaste.mine.nu
Wed Dec 29 18:44:44 UTC 2004

Paul Campbell wrote:
> Let's say that we directly implement an inverted index (keyword-->file) on
> the DHT. Even with common word elimination, this certainly and severely
> generates a very non-linear index space. But it is something that is very
> obvious and desirable since it also solves a major DHT and randomized P2P
> dilema: how to quickly search for files by keyword.
> So far, most proponents of DHT's believe that hashing is the answer. While
> hashing MASKS problems, it doesn't actually solve the problem.
> All solutions that I've seen appear to be pretty simple. The idea is to spread
> the location of a particular key out over a wider arc. Conceptually, one can
> spread it in multiple points (such as spreading it to equadistant points
> around the ring) or on multiple hierarchal rings (like the distributed
> sloppy hash tables...DSHT's although that was done for delay purposes). But
> the concept is essentially the same.
> The routing algorithm remains valid. Since the routing algorithm incrementally
> searches for the destination. It just remains to send a search that is in
> the form of incrementally refining an "area" to search in. As the search
> request passes through potential nodes, the search area gradually narrows.

Some examples:

In a situation where a node typically only requires part of a key's 
value, one can split the key's value among several peers, (e.g. in chord 
an arc of the circle).

E.g. in a filesharing network using a DHT to locate peers with the file 
(hash of file -> peers), you only need some peers, not all.

Another example would be for AND-queries in an inverted index (keyword 
-> file) such as the one mentioned. For e.g. "spears britney" you don't 
need all matches for "spears". Here though, you want a specific subset 
of the "spears" set (the one matching "britney" as well) rather than any 
(random) subset.

One approach could be to map the "spears"-subset of the network the same 
way as the whole network; i.e. after locating the spears-"area" one 
narrows it the same way (by the hash of another keyword), until one gets 
to an area small enough to fully retrieve it or until all keywords match 
(and thus you've found some subset of the "britney spears"-subset).


More information about the P2p-hackers mailing list