[p2p-hackers] Decentralized search engines
Michael Parker
mgp at ucla.edu
Wed Dec 7 19:27:48 UTC 2005
The first step of indexing is the actual keyword extraction itself.
From what I have heard, libextractor is a good open-source solution:
http://gnunet.org/libextractor/
- Mike Parker
Quoting SIMON Gwendal RD-MAPS-ISS <gwendal.simon at francetelecom.com>:
> By the way, one first challenge is the implementation of a nice
> crawler for owned documents : an indexer. This indexer should be able
> to scan and retrieve words from various documents (.html, .doc, .pdf,
> ...). It should be light and run in idle time and, if possible, be
> cross-platform. If you know a good open-source indexer, please let us
> know.
>
More information about the P2p-hackers
mailing list