[p2p-hackers] Error correcting codes to prevent failure in
BitTorrent like systems?
Nick Johnson
arachnid at notdot.net
Mon Dec 5 10:34:55 UTC 2005
Arnaud Legout wrote:
> Hi,
>
> Nick Johnson wrote:
>
>> Systems like BitTorrent have a rather annoying failure mode - the
>> last 'seed' goes offline while there are still several 'peers'
>> (without the complete file) online. Attempts by the peers to
>> reconstruct the original file are rarely successful, as the chances
>> of every single block being present on one of the seeds are generally
>> very low - it's likely that at least one block is missing.
>
> from my point of view this is pure myth.
> I often see such claims that bittorrent suffers from last pieces
> problem; that if there is no seed, the torrent is dead; etc.
> From all the experiments I performed, the reality is very different.
> Rarest first does a very good job at replicating the rarest pieces in
> a torrent, so that the probability to have a piece that is not
> replicated at all is very low.
This is why I'm after stats, not guesses - I'm of the opinion that even
with rarest first, the chances of getting every single block are very
low (remember, if you have 1000 blocks, and you're 99% likely to have
each block, that's still only a 0.004% chance you'll have them all).
However, that's just my guess, and this is just yours - only stats will
show it one way or the other, really.
>
> Outside this target, it makes sense to design other classes of
> applications. However, I am not convinced that error correcting code
> are the
> solution. Such codes are terribly sexy, but when it comes to real
> applications, things are far less sexy.
> It is hard to tune such codes as their relevance really comes from the
> context. If the context is a moving target, then the problem becomes
> very complex.
In this case, the progression of expected percentage of blocks (50% +
25% + 12.5% + ...) gives us a very good idea how large an ECC is
required - after only 4 generations (4 peers with average 50%), the
amount of data expected is over 90%, so for most purposes 10% check
blocks will be sufficient. Since more check blocks don't require any
more transmission, the overhead is pretty low, and it's quite possible
to set the threshold higher if desired. Any amount should give benifits,
however.
-Nick Johnson
More information about the P2p-hackers
mailing list