[p2p-hackers] Tighter HTTP and P2P integration??

Charles Iliya Krempeaux supercanadian at gmail.com
Wed Feb 15 22:39:29 UTC 2006


Hello Karl,

On 2/14/06, Karl A. Magdsick <kmagdsick at limewire.com> wrote:
> Charles Iliya Krempeaux wrote:

[...]

> >>I've done some preliminary work (along with Matt Hamilton from NetSight)
> >>using an Apache plugin to add the X-Alt and X-Node HTTP headers
> >>that allow Gnutella clients downloading the same file to find each other
> >>and take load off the server.
> >>
> >>
> >
> >(I know I could probably read the code to get this info, but I thought
> >it might be easier to just ask, so....)  Could you explain the
> >semantics and usage of X-Alt and X-Node more.  As well as elaborate
> >more on how all this works, please.
> >
> >
>
> First, a short bit about the relationship between Gnutella and HTTP:
>
> Gnutella uses HTTP to transfer all file data.  This is a huge advantage
> for integration with webservers, as Gnutella clients can treat the webserver
> as just another Gnutella client.  Many P2P networks invent their own
> file-transfer protocols, but the overhead of HTTP isn't very large, and
> the ability to pretend that regular webservers are peers is a huge win.
>
> Optional HTTP headers are used to exchange information so that Gnutella
> clients that are downloading the same file can form a "download mesh" -- a
> set of clients that share and share alike chunks of a file they are all
> trying to download.
>
> Once you've installed a webserver plugin to allow the webserver to
> understand
> and send a very small number of optional headers, the webserver can be
> thought of as a special case of Gnutella client: one that can't search
> or download,
> but one that is capable of uploading files, helping coordinate download
> meshes,
> and participating in download meshes.  It's all just HTTP with a handful of
> optional headers.
>
> The Gnutella protocol itself is only necessary for searching, and for
> getting files from hosts that are unable to punch holes in NAT.  It's a fair
> assumption that your webserver isn't behind NAT, or you've punched a hole
> in NAT for your webserver.  In this case, the webserver only needs to speak
> the Gnutella protocol if you want your webserver to respond to Gnutella
> searches or you want some kind of web interface for searching Gnutella.
>
> Facilitating a Gnutella mesh through a webserver is very simple, much
> simpler than implementing a BitTorrent tracker.  There is nearly no
> extra intelligence
> required in the webserver, and the clients all treat the webserver as
> just another
> member of the download mesh.
>
>
> Next, a bit about the headers:
>
> X-Node:  a Gnutella client that's able to punch through NAT/firewalls
> and get
> an externally contactable IP address will send its external IP and
> Gnutella port
> number in this header.
>
> X-Alt: a Gnutella client will send IP:port pairs of Gnutella clients in
> the download
> mesh using the X-Alt header.  If a client is able to punch through NAT,
> it'll
> include itself in the X-Alt list once it is sharing part of the file.
>
> X-NAlt: this header contains a list of "bad" alternate locations.  In
> essence, this
> header says "you gave me some bad entries in an X-Alt header.  stop sending
> out the following IP:port pairs".
>
> X-FAlt: this is the X-Alt header, but for mesh members that are unable
> to punch
> through NAT/firewalls.  The entries contain information about which proxies
> to use to contact these firewalled clients.
>
> X-NFAlt: this is the firewalled version of X-NAlt.
>
>
> How this all works:
>
> For each file to be downloaded, the webserver remembers information that
> it gets
> in X-Alt headers, and spits the same information back out in X-Alt
> headers.  It purges
> X-Alt entries that it sees in X-NAlt headers.  It does the same thing
> for X-FAlt and
> X-NFAlt headers.  The X-Node information may be useful in deciding which
> entries in its internal X-Alt pool should be sent out.

That illuminates things a bit.  I guess my knowledge of the Gnutella
protocol is a bit lacking :-)  (It's been a while since I looked at
the protocol :-)  )

I did a bit of searching (after you wrote this), and came to these 2
documents that were helpful too:

    http://www.the-gdf.org/wiki/index.php?title=Known_HTTP_Download_Headers
    http://www.the-gdf.org/wiki/index.php?title=The_Download_Mesh

The "X-Alt" headers seem to be pretty close to what I was thinking.
Although (in my original post) I had this type of information
communicated with the "Location" header.

And I was also thinking of somehow "encoding" all the different
alternate locations into one URL.  But, the "X-Alt" header seems to be
more flexible, in that you can list all the alternate sources easily.

One problem I see with it (at least for my purposes) is that it only
let's you list a IP address with a TCP port number.  I guess because
Gnutella has a special format for requesting files; as documented
here: http://www.the-gdf.org/wiki/index.php?title=File_Transfer

I think the "X-Alt" would have been able to be used for more things if
you were able to use full URLs or full URIs.  But anyways, something
like "X-Alt" seems better than using the "Location" header.

In the HTTP 1.1 spec, there is mention of a "Link" header.  (See
section 19.6.2.4 of RFC 2068.)  This could probably be used.  The
"Link" header basically has the same semantics as the HTML <link>
element.  So, something like the following could be done.

    Link: <http://example.com/get/it/here.ogg>; rel="alternate"
    Link: <http://somewhere.else.example.net/here/too/video.ogg>;
rel="alternate"
    Link: <magnet:?xt=urn:sha1:WRCIRZV5ZO56CWMNHFV4FRGNPWPPDVKT>;
rel="alternate"
    Link: <something-else:/the/movie#it>; rel="alternate"

(Note, in the example above, the server is sending multiple "Link"
headers in one response.)

With the link you can add other "attributes" too.  (Think "attributes"
in terms of HTML "attribute"... on the <link> element.)  You can have
"title", "class", "media", "type", etc.  (Just like the HTML <link>
element.)

As a side note, Gnutella could even use the "Link" header instead of
the "X-Alt" header.  For example, the following:

    X-Alt: 192.0.2.17:6347, 192.0.2.44

could be translated into:

    Link: <http://192.0.2.17:6347/get/FILE_INDEX/FILE_NAME>; rel="alternate"
    Link: <http://192.0.2.44:6346/get/FILE_INDEX/FILE_NAME>; rel="alternate"

(Where "FILE_INDEX" and "FILE_NAME" would be replaced with whatever
they actually are.)

So, having said all that, I'm still thinking that a "P2P conditional
get" is a good idea.  (For reason I'll explain in the next paragraph. 
But please argue if you disagree.)  And that, the shunting of the
client (or servent or whatever) onto the P2P network could be done via
those "Link" headers.  Now, if some of those "alternate" locations are
Gnutella servents, then you'd use the Gnutella extension to HTTP.  And
if they are using a different protocol, the you do whatever is
necessary for it.

The reason I'm still inclined to go with a "P2P conditional get" is
because it gives some level of control to the publisher.  That way
they are in some sense sanctioning it.  (Which under some situations
matters.)

Now, like with web caching, one could always go ahead and use a P2P
method of getting the file unilaterally -- without asking or needing
the publisher to get involved.

So right now my thinking is something like the following example. 
(Please argue if you disagree.)

Client:

    GET /some/file HTTP/1.1
    Host: example.com
    X-If-No-Alternate: ## PUT STUFF HERE ABOUT WHAT I CAN ACCEPT ##


Server:

    HTTP/1.1 204 No Content
    Link: <http://example.com/go/get/it/there>; rel="alternate"
    Link: <http://example.net/it.torrent#movie.mpeg>; rel="alternate";
    Link: <http://192.0.2.44:6346/get/INDEX/NAME>; rel="alternate",
class="gnutella"

(Note sure if the "204" HTTP result code is the correct one to use
though.  But anyways....)

Note, I'm using a "URI reference" on the BitTorrent URI to access the
file "within" the torrent.  I'm not sure if anyone has used that kind
of notation before.  But it seems like a good way to "give a link to"
what's inside of the torrent.

Also note, that I've maked the Gnutella link with a "class"
parameters... so that the client (or servernt or whatever) can
identify it as being Gnutella.

Thoughts?  Comments?

[...]

See ya

--
    Charles Iliya Krempeaux, B.Sc.

    charles @ reptile.ca
    supercanadian @ gmail.com

    developer weblog: http://ChangeLog.ca/
___________________________________________________________________________
 Make Television                                http://maketelevision.com/



More information about the P2p-hackers mailing list