[p2p-hackers] Re: scalability

Sam Berlin sberlin at gmail.com
Sun Dec 4 00:03:08 UTC 2005


> For instance Bloom Filters increase your scalability but reduce the
> precision of the search - so you get a lot of stuff you didn't want.

Bloom Filters can be used to reduce the amount of incoming queries (in
Gnutella filters are passed from a "leaf" to its "ultrapeer", and
composite filters are passed between neighboring ultrapeers to reduce
last hop & second-to-last-hop traffic).  Once the query passes the
filter test, it can still be forwarded on to the ultimate host, and
that host can make the decision on whether or not to send a reply. 
This eliminates "the stuff you didn't want" from replies while still
keeping traffic low.  Tthe filters in Gnutella reduce ~70% of query
traffic on the second-to-last hop, and ~90% on the last hop (at least,
it did when I last checked a year or so ago).

> A few years ago, a lot of papers in the p2p field that were working on
> stuff like topology, organisational methods, scalability, etc.
> concentrated on finding better ways of getting from object_id to the
> node (number of hops, number of lookups, etc.). The problem from an IR
> perspective is that not all objects are as "simple" as a mp3 file and
> not all searches are as simple as "coldplay", how do you get the
> onject_id in the first place. This becomes a severe problem the more
> complex the objects, their metadata and the queries (for instance
> Boolean, range, content proximity, queries).

Metadata is certainly difficult to search for, but it isn't
impossible.  It's vastly easier to search using metadata in a network
such as Gnutella than in a DHT-based network, as you don't have to
prepopulate the tables with all kinds of data.  There's lots of active
metadata searches going on (again, in Gnutella), including searches
for file names, most-recently-downloaded, specific data in id3 tags,
file's licenses, etc...

IMHO, the less a network is structured (ie, doesn't have an organized
topology), the easier it is to add arbitrary searches.  This is
because there's no need to add another overlay for a new kind of
search -- the network can function as-is.  Of course, certain
topologies can help when some kinds of searches are predominant.

Sam



More information about the P2p-hackers mailing list