[p2p-hackers] Re: scalability (was: p2p framework)
John Casey
john.casey at gmail.com
Fri Dec 2 00:07:56 UTC 2005
On 12/1/05, Ronald Wertlen <rrrw at neofonie.de> wrote:
> Hi,
>
> Gnutella-bashing certainly may be fun, the truth is, it is tremendously
> well-adapted for its purpose (I think Serguei's said the relevant stuff).
>
> However, I also believe it is pretty clear that from a search point of
> view, a random super-peer based network does not scale - it is never
> going to get the kind of precision and recall that we would call
> intelligent. It would be too slow or too inaccurate.
But if you index everything in some sort of distributed inverted index
on top of a DHT a lot of document postings and related meta data still
have to be exported to the network which isn't such a great solution
either. The worst thing is that semantically close terms and documents
are going to be scattered to random locations to remote locations in
the network for indexing. Personally what I think is needed here is a
slightly coarser indexing structure. So that instead of publishing
1000s of term->document pointers or at the other extreme a few
term->peer as with PlanetP there is some sort of middle ground such as
term->cluster-id which is better able to direct a search to sensible
peers. The difficulty of course with this approach is that it isn't
that easy to construct sensible global clusters from local cluster
definitions as different local document databases will index different
terms and the like.
More information about the P2p-hackers
mailing list