[p2p-hackers] paper "rarest first and choke algorithms are enough"

Arnaud Legout Arnaud.Legout at sophia.inria.fr
Thu May 4 09:04:40 UTC 2006


Hi Bryan,

first thanks a lot for your comments.
I am answering them inline.

Bryan Turner wrote:
> Arnaud,
>
>     Thanks for the new paper, there is a lot of nice data here.  The 
> presented graphs and analysis indeed suggest that the BitTorrent 
> protocol's choices achieve near-optimal data distribution in its 
> paradigm (single file, block-based content distribution).  I agree 
> with your suggestions for fairness criteria, and my own suggestions 
> are similar (my suggestion is for a reputation-based system to model 
> leechers and free-riders).
the first point I want to make clear is that we place ourselves in the 
context of P2P file delivery in the current Internet.
This context has one important implications: there is a good network 
connectivity and everybody can discuss
with everybody (2 NATed peers cannot communicate
with each other directly, but this is not a network issue, this is an 
end user issue)
You will see in the following why this is important.

That should be clear in the paper, if it was not tell me so that I can 
improve the argumentation.

>
>     However, I disagree with several conclusions you make along the 
> way.  A brief overview:
> - Byte-for-byte fairness is not appropriate
I still claim that in the context of P2P file transfer (see below)
> - Network Coding is not justified
in our context. Network coding can be a good solution in the context of 
ad-hoc networks, or in any other network where connectivity
is a problem, i.e., when local rarest first does not work optimally. 
Note, however, that network coding requires significant computing
capabilities that a sensor network or an ad hoc network based on basic 
terminals (e.g., mobile phones) cannot afford for now.

There are other scenarios where network coding can help.
But, I still claim that in the current Internet and on a representative 
set of reat torrents we monitored, network coding would not improve
the performance significantly enough to be justified.

>
> Fairness:
>
>     I am unconvinced by the arguments against byte-for-byte fairness.  
> Your evidence to support this is that tit-for-tat does not take into 
> account the excess capacity of some torrents (ie: more seeders than 
> leechers).  Yet this situation is equivalent under byte-for-byte and 
> choking rules. Leechers are upholding the byte-for-byte policy 
> (seeders cannot, as you mention) and thus only leechers are 
> restricting their upload capacities to free-riders, and rightly so.  
> Seeds (who hold the excess capacity) are distributing this capacity 
> fairly to all peers.  Thus, choking and byte-for-byte are equivalent 
> in this regard - free riders receive a share of the bandwidth equal to 
> the fair distribution given by seeders in both cases.
I am not sure I understand your point.
Byte-for-byte (BFB) algorithm and choke algorithm (CA) are far from 
being equivalent.
In all the studies that mention BFB I am aware of, they never mention 
the case of seeds. They simply say that peers must not receive more than 
they give.
This is the definition of BFB. You can introduce a threshold, but it 
does not change the main idea and there is no proposed solution
to define a dynamic threshold.

It is not correct to say that peers must not receive more than they 
send. As I wrote in the report, leechers have an asymmetric capacity most
of the time and seeds offer capacity for free. With BFB you cannot 
benefit from this excess capacity. The only one solution is to
define a dynamic threshold, but this is an optimization problem that is 
not solved, and I believe to be very hard to solve practically, for no 
benefit
compared to CA.

Remember that the main (and single argument) in all the studies that 
discuss BFB solutions is that free riders can get a high
service from a torrent. This problem in all the studies I am aware of 
comes from the old choke algorithm in seed state that
favors fast downloaders. With the new choke algorithm in seed state all 
the observed unfairness disappear (in the sense of the fairness
I define in the paper).


>
>     The more important case, when excess capacity is not available, 
> free riders should be eliminated or diminished.  Because leecher 
> participation cannot be proven, the only other method of measurement 
> is via equal exchange of services.  Byte-for-byte is one such 
> measurement (reputation systems are another).  In this scenario, free 
> riders will only receive a small credit from the leechers in the swarm 
> before having to resort to hand-outs from the seeds.  With the seeds 
> giving fair capacity to the swarm, the free riders will receive very 
> little of the swarm capacity, while the participating leechers will 
> receive fair resources.
CA gives the same result from our measurements. It guarantees that a 
peer cannot receive more from a torrent than a peer that contributes 
more than him.
Here again, with BFB we will not use all the capacity of the torrent 
because it is based on a fairness criterion that is not correct in a P2P 
architecture.
>
>     Free riders always receive more resources under choking than under 
> exchange measurement models.
and that is fine as long as they do not receive more than someone that 
contributes.
I understand that you argue in favor of global efficiency. It is true 
that if you do not give anything to free riders, you will have
more capacity for the ones that contributes. But it is not clear that 
they can make use of it, and it is not clear that giving a small service
to free riders impact the overall torrent.

I have shown a long time ago in the context of multicast that it is 
possible to define a policy that gives more to multicast users without
significantly decreasing the performance of unicast ones.
Even if the context is different, I believe that the issue is equivalent.
>
> Network Coding:
>
>     On the topic of Network Coding [1], I did not see any evidence in 
> your paper refuting the network coding literature.  Please correct me 
> if I missed it, but it appears your only rebuttal is in the poor 
> modeling of BitTorrent, not in the application of network coding 
> techniques.  From this I don't see how you conclude that network 
> coding is not justified?  I agree that most of the simulation 
> literature does not model BitTorrent correctly, but this doesn't 
> negate the positive results of the network coding trials, only their 
> comparisons to BitTorrent.
That is not my point. We claim that we cannot gain a significant 
performance improvement using network coding in the context we are 
studying.
As I said at the beginning of this email network coding is undoubtedly 
useful in a number of situations, not in our context.
However, the context we are studying is very important and there are 
tons of persons believing that they can improve a P2P client
for file sharing over the Internet using network coding techniques. This 
is not true and this is due to a misunderstanding of the
dynamics of BitTorrent that I hope the paper help to understand better.
>
>    In section 4-A, pp 6-7, you suggest that source or network coding 
> may not propose interesting pieces to peers during the startup phase.  
> I counter that this is exactly what network coding solves - every 
> packet has an innovation rate based on the network flows from the 
> local node to the source(s).  Assuming a densely connected graph with 
> fair sharing from the seeder, it is unlikely that any packet is not 
> innovative during the startup phase.  Unless the local node's 
> bandwidth is many multiples of the seeder, a problem which is not 
> specific to network coding.
You did not get the point. consider a seed with a finite upload capacity 
C Bytes/s and a file to send of size S Bytes.
Then the seed needs in the best case S/C seconds to send the entire 
content. Whatever coding technique you use you cannot do it faster.

We identified that when the performance of a torrent is not optimal, 
then it is in a transient phase. That means that the seed has not yet
sent a copy of each piece. If you use network coding of whatever coding 
you can imagine, we will not improve that fact that information is missing
in the torrent and that is this missing information that causes the 
decrease in performance.
>
>    Network Coding is superior to Rarest First and other 
> BitTorrent-like protocols for several reasons:
> 1) The innovation rate is the min-cut of all flows from the local node 
> to the source(s), which is likely to be much larger than BitTorrent's 
> default of 4 active connections.
> 2) BitTorrent transfers pieces in blocks-at-a-time, but only 
> advertises in piece-at-a-time.  This injects a significant delay 
> between downloading of an innovative byte and sharing of that byte (on 
> average, one half the download time of an entire piece).  Network 
> coding imposes no such delay.
> 3) There is no first-blocks or last-blocks problem with network 
> coding, they are entirely non-issues.  Large portions of your paper 
> are devoted to these issues which simply don't occur under network coding.
> 4) The meta-network utilization is higher for network coding.  That 
> is, since BitTorrent cannot align to the underlying physical 
> connections of the network elements, it is impossible to maximally 
> utilize them.  Network coding (using an innovation rate tracking 
> algorithm) can rapidly align to the structure of the physical network, 
> eliminating cross-core retransmissions.  This saves a significant 
> amount of ISP bandwidth and reduces the overall burden of supporting 
> filesharing services on the internet.
All you say is theory and is true, but the real difference between 
network coding and rarest first depends on the context.
All I can tell you is that measurements show that in the context I am 
discussing there is not significant differences between network coding
and Rarest Fist.
>     Is the full implementation of Network Coding superior enough to 
> BitTorrent that it offers compelling reasons for migration to it as a 
> solution?  Your paper argues that BitTorrent is "good enough", and I 
> tend to agree.
I do not believe it makes sense to say that one solution is superior to 
another one in all cases.
In our specific context, the rarest first and choke algorithms are 
enough, but not superior with respect to performance. But they are much 
simpler.
In other contexts Network Coding will be superior.

The point of the paper is to show which kind of performance we can 
achieve using rarest first and choke algorithms, and to stop
legends that say that there is always last pieces problems problems or 
fairness issues. These legends, as we show with the notion of transient 
phase,
are due to a misunderstanding of the very complex dynamics of P2P 
protocols.

Regards,
Arnaud.





More information about the P2p-hackers mailing list