[p2p-hackers] Decentralized search engines

Michael Parker mgp at ucla.edu
Wed Dec 7 19:27:48 UTC 2005


The first step of indexing is the actual keyword extraction itself. 
 From what I have heard, libextractor is a good open-source solution: 
http://gnunet.org/libextractor/

- Mike Parker


Quoting SIMON Gwendal RD-MAPS-ISS <gwendal.simon at francetelecom.com>:

> By the way, one first challenge is the implementation of a nice 
> crawler for owned documents : an indexer. This indexer should be able 
> to scan and retrieve words from various documents (.html, .doc, .pdf, 
> ...). It should be light and run in idle time and, if possible, be 
> cross-platform. If you know a good open-source indexer, please let us 
> know.
>



More information about the P2p-hackers mailing list