[p2p-hackers] Generalizing BitTorrent..
Bryan Turner
bryan.turner at pobox.com
Wed Jan 19 15:19:22 UTC 2005
Greg,
Actually, this is not a problem for torrents. Bram was very
inventive with the torrent file format and allows torrent builders to
include a ragged hierarchy. In the case of the MAME torrents, the torrent
file is literally 95% the same. The ragged hierarchy (including directories
and file names, but not flags to my knowledge) is included in the torrent,
followed by a list of fixed-size chunks assigned to each file in-order. The
last chunk of each file may be smaller than the fixed size.
In effect, a torrent file is already a catalog of how to glue the
pieces together! The main difference is in what you're looking for - a
piece or an entire torrent. I propose looking for each piece separately,
while Bit Torrent searches for each torrent as a whole.
Good torrents tend to individually gzip the files in-place, then
export the entire directory as a torrent (rather than as a tar/gzip
archive). This is how the MAME torrents are designed and it works
incredibly smoothly. Only the changes are downloaded between versions, just
like a base + diffs model.
I believe the method I proposed is more general, because it includes
the base + diffs model as well as ragged shared-hierarchy systems that have
nothing else in common. For instance, a source distribution of a large
open-source project. In order to distribute the entire project in one lump,
it may include significant common files from other open-source projects.
This leads to many torrents each sharing the common libraries. Peers
looking for Project A trade common files with peers looking for Project B,
and also with peers looking only for the common library. Seeds of one
project are also seeds of all the others which include common functionality.
--Bryan
bryan.turner at pobox.com
-----Original Message-----
From: p2p-hackers-bounces at zgp.org [mailto:p2p-hackers-bounces at zgp.org] On
Behalf Of Gregory P. Smith
Sent: Saturday, January 15, 2005 3:03 AM
To: Peer-to-peer development.
Subject: Re: [p2p-hackers] Generalizing BitTorrent..
The flaw in this logic here is that to aggregate common data across
different instances of content in a system you need to be able to locate and
identify the common data portions. In a typical tarball of a new version of
something where only 5% of the files have updated
-most- of the hashes of the fixed sized pieces are likely to change;
certianly -way- more than 5% anyways. why? because the common data has
shifted around or in the case of compressed streams of data
(.tar.bz2) the entire stream will be different. To get any benefit from
this the content would need to be extreemly carefully packaged.
no zips, no tars, no compression, etc. That alone could destroy the
benefit.
Others have already mentioned an alternate general solution that applies to
-any- distribution method (the linux kernel is distributed this way):
updates that share data should be published as binary diffs against the
previous version. Downloading n+1 becomes a recursive "download n and the
n->n+1 diff" operation.
What you're really desiring is for peers to integrate the diff knowledge so
that it doesn't need to be done manually and so that it automatically
decides when the base+sum(diffs against base) warrants just issuing a new
base to distribute for future diffs to start from.
(fwiw, some version control systems make many of the same decisions as to
how they store versions of data internally)
-greg
More information about the P2p-hackers
mailing list