[p2p-hackers] Error correcting codes to prevent failure in BitTorrent like systems?

Nick Johnson arachnid at notdot.net
Mon Dec 5 10:34:55 UTC 2005


Arnaud Legout wrote:

> Hi,
>
> Nick Johnson wrote:
>
>> Systems like BitTorrent have a rather annoying failure mode - the 
>> last 'seed' goes offline while there are still several 'peers' 
>> (without the complete file) online. Attempts by the peers to 
>> reconstruct the original file are rarely successful, as the chances 
>> of every single block being present on one of the seeds are generally 
>> very low - it's likely that at least one block is missing.
>
> from my point of view this is pure myth.
> I often see such claims that bittorrent suffers from last pieces 
> problem; that if there is no seed, the torrent is dead; etc.
> From all the experiments I performed, the reality is very different.
> Rarest first does a very good job at  replicating the rarest pieces in 
> a torrent, so that  the probability to have a piece that is not
> replicated at all is very low.

This is why I'm after stats, not guesses - I'm of the opinion that even 
with rarest first, the chances of getting every single block are very 
low (remember, if you have 1000 blocks, and you're 99% likely to have 
each block, that's still only a 0.004% chance you'll have them all). 
However, that's just my guess, and this is just yours - only stats will 
show it one way or the other, really.

>
> Outside this target, it makes sense to design other classes of 
> applications. However, I am not convinced that error correcting code 
> are the
> solution. Such codes are terribly sexy, but when it comes to real 
> applications, things are far less sexy.
> It is hard to tune such codes as their relevance really comes from the 
> context. If the context is a moving target, then the problem becomes 
> very complex.

In this case, the progression of expected percentage of blocks (50% + 
25% + 12.5% + ...) gives us a very good idea how large an ECC is 
required - after only 4 generations (4 peers with average 50%), the 
amount of data expected is over 90%, so for most purposes 10% check 
blocks will be sufficient. Since more check blocks don't require any 
more transmission, the overhead is pretty low, and it's quite possible 
to set the threshold higher if desired. Any amount should give benifits, 
however.

-Nick Johnson



More information about the P2p-hackers mailing list