Dijjer and Freenet (RE: [p2p-hackers] clustering)

Serguei Osokine osokin at osokin.com
Sun Mar 12 03:08:38 UTC 2006


On Tuesday, March 07, 2006 Ian Clarke wrote:
> I think Chapters 1 and 2 of Oskar Sandberg's recent thesis may be  
> extremely relevant:
>
>   http://www.math.chalmers.se/~ossa/lic.pdf
>
> It outlines an algorithm to choose which nodes in a network should  
> have edges between them such that "greedy" routing (routing to the  
> peer closest to what you are looking for) has small world O( log^2 
> (N) ) path lengths.

	Ian, Oskar, that was wonderful. I rarely see anything that has
that much aesthetic value, but the analyitical proof that the graph
rewiring caused by the random search actions actually makes it a 
small-world graph with log^2(N) path length, was certainly worth the
one-year wait since last April, which is when Ian has preannounced 
this paper.

	I have to apologize in advance for my questions - some of them
might be downright stupid (I won't even pretend that I followed the
logic of all your proofs), and in any case there are many of them.
Perhaps, too many. Feel free to answer any subset that does not seem
stupid to you (and as a special case, of course, just ignore all of 
them :-)

	But I waited for this paper for a year, and as you might imagine,
I had time to ponder quite a few issues... So here we go:

1. What is the difference between Freenet[1] and Sandberg/Dijjer[2]
   models that makes Dijjer easier to analyze? This is mentioned a few
   times in [2] and in other Freenet- and Dijjer- related places, but
   I did not notice the explicit explanation of the differences
   anywhere. Was it the directed links in the graph in [2]? Something
   else? I honestly tried to figure this out by myself, but couldn't; 
   sorry... Maybe if I'd spend more time looking at both [1] and [2],
   I'd figure this out, but since you guys are already here, I thought
   that maybe you could answer that? 

2. Whatever this difference is, how relevant do you think are the
   Sandberg/Dijjer results[2] for the original Freenet paper[1]?

   For example:

	a. section 5.2 of [1] says about Fig. 3: "We can see that the
	   pathlength scales approximately logarithmically, with a change
	   of slope near 40,000 nodes."

	   Given what you guys know now, do you think that Fig. 3 shows
	   log^2(N) instead, or Freenet simply has different scaling
	   properties due to its different algorithms?

	b. Fig. 5 shows the link number distribution that awfully
	   resembles the power-law one, whereas Oskar mentioned in
	   another mail that Kleinberg is constant out-degree and 
	   Poisson in-degree. Is my assumption that Sandberg/Dijjer 
	   model is supposed to mimic the Kleinberg distibution
	   incorrect and in fact both Sandberg/Dijjer and Freenet
	   are power-law? Or is just one of them power law due to
	   the different algorithms used? Or maybe Fig. 5 in [1] is
	   actually showing a part of the Poisson distribution that 
	   can be easily mistaken for the power law?

	c. Section 5.4 of [1] says: "In a small-world network, the
	   majority of nodes have only relatively few, local, connections
	   to other nodes, while a small number of nodes have large, 
	   wide-ranging sets of connections. Small-world networks permit
	   efficient short paths between arbitrary points because of the
	   shortcuts provided by the well-connected nodes..."

	   This does not sound like anything I've been able to see in the
	   Oskar's thesis[2] - it has left me with a distinct impression
	   that the connectivity in the small world (at least in the one
	   analyzed there) was provided by the exactly right (Kleinberg's)
	   proportion of long-distance links, which Sandberg/Dijjer
	   rewiring model was designed to achieve, and not by the means 
	   of any special nodes with "wide-ranging sets of connections".
	   Is it because Freenet and Dijjer used different algorithms, or
	   the original assumption about the reasons for the Freenet
	   connectivity was wrong? 

	   Or is it simply that I'm misreading the Freenet paper[1] and
	   it did not mean to imply that small world properties of Freenet
	   are due to the small subset of well connected nodes? Maybe my
	   understanding of Sandberg/Dijjer[2] is wrong?

3. Why would one want to use the system with O(log^2(N)) path length
   when pretty much every DHT gives a path lenght of O(log(N))? At
   today's P2P nets scale (millions of nodes) the difference between
   log and log^2 is non-trivial. I understand that creating the small-
   world network by a simple and natural evolutionary process is very
   elegant, simple, and attractive, but is this elegance worth the 
   increased path length? Why not simply use a DHT?

4. Speaking of DHTs: Oskar recently said something curious here that 
   I did not understand: "Actually, while the frequency of Chord links
   falls exponentially with the "level" (not sure what the Chord term
   is) the length of such links increases exponentially as well, so in
   fact the frequency of links with certain lengths do fall
   harmonically. One could see Chord as some sort of "mean field"
   version of the same dynamics as Kleinberg's model."

   What exactly does this mean? DHTs (including Chord) have log(N)
   diameter, and Kleinberg has log^2(N), right? This looks fairly
   different to me, and I totally missed the meaning of the "mean
   field", and why Chord and Kleinberg would have the same dynamics;
   I did not even understand what is this dynamics that you're talking
   about. Oskar, could you please elaborate on this?

5. Throughout this mail I've been using the expresson Sandberg/Dijjer 
   to denote both what is described in [2] and Dijjer, as if it is the 
   same system; is this a right thing to do, or there are significant
   differences between these two? What about these directed links, for
   example? Does Dijjer use them? Does this matter?

6. Both the simulation in Freenet paper[1] and Sandberg/Dijjer paper[2]
   use the random requests to rewire the networks. In reality, there
   will be all kinds of hotspots and uneven distributions of requests
   due to the different popularity of content. Do you guys think it
   would affect the results of analysis performed in [2], and if yes,
   how? For example, if one particular file is extremely popular, one
   could imagine that its storage node[s] will have many more links 
   than the average, and the search for the other (rarely requested)
   data items might become ineffective, because these path lengths will
   become very high. 

   Do you think it is a valid concern? Are there any results that would
   show that this is not an issue?

7. While Sandberg [2] assumes that the requests are passing between 
   the random points, both Freenet and Dijjer seem to allow the length
   of the path to be shortened when the requested content is already
   cached on the intermediate node. If this is really so, what is the
   effect that will have on Oskar's analysis results? Fewer links will
   be rewired, so this should have some effect on the graph dynamics,
   right? Any idea what it would be?

8. The answer to '7' above might also depend on the cache size and 
   replacement policies on the nodes, and on the content popularity
   distribution (which I already mentioned in '6' above). All these
   factors are hard to predict for the real system in advance, and I'm
   wondering what is the feeling that you guys have about the overall
   stability of the rewiring algorithm?

   I mean, the original Kleinberg d(x, y)^(-1) approach is extremely
   sensitive to the value of this "-1". Any other value, and you either
   do not have enough long distance links, or you do not have enough
   short-range links once you arrive into the general vicinity of your
   destination. In any case, your path length becomes polynomial instead
   of polylogarithmic once you have the smallest deviation from this
   "-1" number.

   And this makes me a bit uneasy; I do understand that the catastrophic
   failure is unlikely, and the detrimental effect of all these changes
   (if any) is likely to be tolerable, but still... Look at it this
   way: Chapter 1 of Oskar's thesis is dedicated to proving that 
   "...family given by (1.1) allows for polylogarithmic routing at, 
   and only at, one value of the [alpha]...", then Chapter 2 proves
   that this is exactly what happens with random-request rewiring, and
   then Dijjer's practical implementation and deployment arrive and
   throw these random requests out of the window with their caching,
   different content popularity, and whatnot. So given the strict
   requirements that were proven to be vital in Chapter 1, I cannot 
   help wondering: is Dijjer still polylogarithmic, or not?

   I'd feel much better about all this if there would be some results
   showing that the system is stable in the presence of such changes
   (in a sense that reasonably small algorithm input changes should not
   cause any catastrophically large changes in the algorithm results).
   Do you have any results that would point in that direction? Would
   you say that the data displayed on Figures 2.4-2.6 already shows 
   that your results are not exactly Kleinberg, but this does not 
   cause any catastrophic consequences, or you also have some other
   sources of optimism?

9. And finally, if I'm not mistaken, Dijjer description seems to imply
   that the links are rewired after every search, whereas Oskar suggests
   that this should be happen only with proability p, and in section 2.3
   says: "...there are simple heuristic arguments for why p should
   reasonably be on the order of one over the expected length of the 
   greedy walks". Any particular reason for that Dijjer behaviour, or
   I've missed something and Dijjer does, in fact, rewire with this
   probability p? (Would be interesting to hear these simple heuristic
   arguments too, but that's another story...)

[1] Freenet: A Distributed Anonymous Information Storage and Retrieval
    System. Ian Clarke1, Oskar Sandberg2, Brandon Wiley3, and Theodore
    W. Hong.
    http://www.cl.cam.ac.uk/~twh25/academic/papers/icsi-revised.pdf

[2] Searching in a Small World. Oskar Sandberg.
    http://www.math.chalmers.se/~ossa/lic.pdf

	Best wishes -
	S.Osokine.
	11 Mar 2006.


-----Original Message-----
From: p2p-hackers-bounces at zgp.org [mailto:p2p-hackers-bounces at zgp.org]On
Behalf Of Ian Clarke
Sent: Tuesday, March 07, 2006 5:03 PM
To: Peer-to-peer development.
Subject: Re: [p2p-hackers] clustering


I think Chapters 1 and 2 of Oskar Sandberg's recent thesis may be  
extremely relevant:

   http://www.math.chalmers.se/~ossa/lic.pdf

It outlines an algorithm to choose which nodes in a network should  
have edges between them such that "greedy" routing (routing to the  
peer closest to what you are looking for) has small world O( log^2 
(N) ) path lengths.

The algorithm is very simple, and pleasingly "natural".  Basically  
you use greedy routing to find a path to your intended destination  
node, then you add a link from each node along this path to the  
destination.  The number of outbound edges from each node can't  
obviously increase indefinitely, so they are deleted according to a  
least recently used scheme to make room for new edges.

This is based on Freenet's approach to edge selection.  The paper  
describes both experimental results that confirm that this algorithm  
does lead to a small world topology, and also delves into the  
mathematics behind why this might be the case.

It is interesting not just because of the simplicity of the  
algorithm, but because it is extremely amenable to the construction  
and maintainence of decentralized P2P networks, and because it could  
tell us something about why human relationships tend to form small  
world networks.

This algorithm is used by the Dijjer (http://dijjer.org/) P2P network.

Ian.

On 7 Mar 2006, at 02:46, Jeff Rose wrote:

> Anyone out there doing work in clustering?  I'd be interested in  
> hearing what people are up to, or if you have pointers to good  
> papers that would be great too.  At this point I'm interested in  
> any angle, geographic distance based, semantic distance based,  
> hierarchical, non-hierarchical etc...  My goal is to work towards a  
> very amorphous clustering scheme that allows any kind of object to  
> be located close to relatives in the network as long as they share  
> a common distance function.

_______________________________________________
p2p-hackers mailing list
p2p-hackers at zgp.org
http://zgp.org/mailman/listinfo/p2p-hackers
_______________________________________________
Here is a web page listing P2P Conferences:
http://www.neurogrid.net/twiki/bin/view/Main/PeerToPeerConferences




More information about the P2p-hackers mailing list