TR: [p2p-hackers] Paradigma Question: DHT's or Small World?

SIMON Gwendal RD-MAPS-ISS gwendal.simon at francetelecom.com
Thu Feb 3 15:49:10 UTC 2005


Hi,

	Here are two assumptions that advocate for small-world.

	The first one, related to the human language, has been partially established by several studies [1,2] since the pioneering work of [3]. The graph of word interactions is constructed by linking two words when they co-occur in a sentence (a fortiori in a file). The study of the properties of these graphs shows they exhibit the small world effect and a scale-free distribution of degrees.

	The second assumption follows the observations you cite and some others [4,5,6]. The data-sharing graph is constructed by linking two users when they share a same file. Observations on several real traces show that this graph exhibits also the small-world effect and the scale-free distribution of degrees.

	Besides, it is known that the lexicon of an human contains few thousands of words. This lexicon and the words contained in the documents which have been produced and dowloaded by an user define her "semantic profile". Through the preceeding assumptions, we naturally infer that the graph generated by linking users when their semantic profile overlap is also small-world and scale-free.

	That is, if we consider that users emit requests on keywords chosen within their profile, we can expect that almost *all* files of interest for an user are stored by a small set of "friends". Moreover, these "friends" are already known by the user thanks to previous successfull queries.

	Therefore, it is possible to limit the search to a subspace of the information space without preventing the quality of responses. On the contrary, it is probable that these responses are more relevant for the requester point of view. For instance, a fan of "Fiona Apple" will discover mp3 of Fiona Apple and not informations on Apple Inc. or webpages for "apple pie" cooking. Or, an European querying informations on "football" will not receive pages on NFL.

	By the way, another related concern is the publication of a file.
	In a gnutella-like systems, peers just have to put their files in their "shared directory" in order to make them available by any node in the system.
	On the contrary, the task of publication in a DHT-based overlay requires to reach as many peers as the number of words describing the published document. Indeed, the published file has to be known by the peers that are responsible of all the *relevant* words of the document. This is clearly an issue for keyword-based search in DHTs. If you want to design a search engine indexing *all* words in the document, this task becomes unrealistic.

--------------------
Gwendal Simon
France Telecom R&D
http://solipsis.netofpeers.net


[1] D. Watts. Six Degrees.
[2] A. Barabasi. Linked: the New Science of Networks.
[3] R. Ferrer i Canco and R. Sole. The Small World of Human Language.
[4] J. Keller, D. Stern and F. Dang Ngoc. MAAY: A Self-Adaptive Peer Network for Efficient Document Search.
[5] V. Cholvi, P. Felber, and E.W. Biersack. Efficient Search in Unstructured Peer-to-Peer Networks.
[6] Adriana Iamnitchi, Matei Ripeanu and Ian Foster, Small-World File-Sharing Communities.



 

> -----Message d'origine-----
> De : p2p-hackers-bounces at zgp.org
> [mailto:p2p-hackers-bounces at zgp.org] De la part de Alexander Löser 
> Envoyé : jeudi 3 février 2005 11:32 À : p2p-hackers at zgp.org Objet : 
> [p2p-hackers] Paradigma Question: DHT's or Small World?
> 
> Hey all,
> structured overlay networks  based on DHT's, such as Pastry and  Chord 
> among others, have been investigated in the past to construct scalable 
> and performance orientated peer-to-peer networks.  However, 
> unstructured networks, such as Gnutella or Kazaa,  are still widely 
> used among the file sharing community. Recently researchers proposed 
> extensions to unstructured networks  networks based on the small world
> idea: peers dynamically create shortcuts to other peers based on their 
> interests.
> Over a while peers with the same interests became direct neighbors
> through its shortcuts and build interest based clusters.   Hence peers
> no longer flood messages but partly route it's queries via a 
> interested based/semantic  overlay.  Examples are described in [1] [2] 
> among others.
> 
> Comparing small world and DHT  approaches is a difficult task, since 
> simulations usually differ in scenarios, data sets or simulation
> methodology.    I'm interested in scenarios and arguments PRO small
> world overlays for unstructured networks. Does anybody now actual 
> theoretic or practical work that compares both approaches in different 
> scenarios (high churn, no super peers, key word based search, meta 
> data based search)?  Which scenarios or arguments support small world 
> approaches for unstructured networks?
> 
> Alex
> 
> 
> 
> 
> [1] Gia - Making Gnutella like P2P Systems Scalable 
> http://berkeley.intel-research.net/sylvia/1103-chawathe.pdf
> http://seattle.intel-research.net/people/yatin/publications/ta
> lks/sigcomm2003-gia.ppt
> 
> [2]  Efficient Content Location Using Interest Based Locality in 
> Peer-to-Peer Systems http://www.ieee-infocom.org/2003/papers/53_01.PDF
> --
> ___________________________________________________________
> 
>   Alexander Löser
>   Technische Universitaet Berlin
>   hp: http://cis.cs.tu-berlin.de/~aloeser/
>   office: +49- 30-314-25551
>   fax   : +49- 30-314-21601
> ___________________________________________________________
> 
> 
> _______________________________________________
> p2p-hackers mailing list
> p2p-hackers at zgp.org
> http://zgp.org/mailman/listinfo/p2p-hackers
> _______________________________________________
> Here is a web page listing P2P Conferences:
> http://www.neurogrid.net/twiki/bin/view/Main/PeerToPeerConferences
> 



More information about the P2p-hackers mailing list