[p2p-hackers] DHT and complex queries

Enzo Michelangeli em at em.no-ip.com
Tue Jul 5 13:38:14 UTC 2005


----- Original Message ----- 
From: "Davide Carboni" <dcarboni at gmail.com>
To: "Peer-to-peer development." <p2p-hackers at zgp.org>
Sent: Monday, July 04, 2005 8:23 PM
Subject: [p2p-hackers] DHT and complex queries


> Hi, I'm developing a P2P system where clients must be able to send
> complex queries such as:
>
> author equal "foo"
> and
> publicationDate is between "12 12 1999" and "12 12 2000"
> andnot
> distributor startswith "galax"
>
> etc. etc.
>
> I'm reading some papers about DHT and I wonder whether or not DHT is a
> viable solution for these requirements. I know that with DHT you can
> map a key onto a piece of data, but I cannot see how to manage complex
> query like the one in the example. Probably DHT is not the "good"
> solution for this...but never know,
>
> Any hints?

One simple but quite inefficient way of doing this is to search first for
the exact match (author equal "foo") and then filter the results. This
assumes that the query always has the format "(key==value) AND (<other
complex boolean expression>)". Overnet, the Kademlia derivative used by
eDonkey 1.0, does precisely that: the "value" stored in the DHT, besides
another key for the location search, is a dictionary of metadata made of
<tagname=tagvalue> pairs, with some tags being numeric (e.g., file size)
and most others simple strings. The thing is less horrible than it may
sound at first, because the filtering is performed "at the server side",
in each peer being queried, rather than by the client that queried the
DHT. This prevents a lot of unnecessary traffic.

Most Overnet clients can only issue simple queries based on the AND of a
number of conditions, but in my KadC library
(http://kadc.sourceforge.net/ ) I have built a recursive descent parser
that can accept arbitrarily complex expressions. I quote from
http://kadc.sourceforge.net/Quickstart.html :

    [...]
    The filter's syntax is an arbitrarily complex expression involving
    keywords (case insensitive), "tagname=tagvalue" or
    "integertagname{relop}integertagvalue" (with <relop> being >, <, =,
    !=, >=, <=), &, | and ! boolean operators, and parentheses to force
    precedence.

    For example, the filter:

      billie&lester&(SIZE!=123|bitrate<=256)&!FORMAT=jpeg

    ...is parsed as:

    ((.TRUE. AND_NOT FORMAT=jpeg) AND ((bitrate<257 OR (SIZE>123 OR
    SIZE<123)) AND (lester AND billie)))

    For more detail, see the comments in KadCparser.c .
    [...]

(the apparently bizarre translation depends of the lack of "NOT", "GE" and
"LE" operators in the syntax of the filters handled by Overnet and
eDonkey).

Enzo





More information about the P2p-hackers mailing list