[p2p-hackers] The Content-Addressable Web

Justin Chapweske justin at chapweske.com
Thu Oct 25 05:30:01 UTC 2001


I've just finished the first draft of "HTTP Extensions for a 
Content-Addressable Web".  I believe that these simple extensions are a 
huge step forward in providing interopability between P2P systems.

I would like for people to start brain-storming around this document. 
Pick it apart and make it stronger.

The documents are located at http://onionneworks.com/caw/caw.ps and 
http://onionnetworks.com/caw/caw.txt

I've also attached those documents to this e-mail.

Thanks,

--
Justin Chapweske, Onion Networks
http://onionnetworks.com/


-------------- next part --------------


HTTP Extensions for a Content-Addressable Web

Justin Chapweske, Onion Networks (justin at onionnetworks.com)

October 7, 2001

Abstract

The goal of the Content-Addressable Web (CAW) is to create
a URN-based Web that can be optimized for content distribution.
The use of URNs allows advanced caching techniques to be
employed, and sets the foundation for creating ad hoc Content
Distribution Networks (CDNs). This document specifies HTTP
extensions that bridge the current Location-Based Web with
the Content-Addressable Web. 

Table of Contents

1 Introduction
2 Self-Verifying URNs
3 HTTP Extensions
    3.1 X-Content-URN
    3.2 X-Target-URN
    3.3 X-URN-N2*
        3.3.1 X-URN-N2R
        3.3.2 X-URN-N2L and X-URN-N2Ls
4 An Example Application
5 Open Issues
6 Acknowledgments



1 Introduction

The rise in popularity of Content Distribution Networks (CDNs),
such as Akamai, have shown that significant improvements
can be made in throughput, latency, and scalability when
content is distributed throughout the network and delivered
from the edge. The fact that companies such as Tucows, FilePlanet,
and various Linux distributions force their users to manually
select mirrors points to a hole in existing web caching
infrastructure.

Standard web caching can provide significant benefits in
certain situations, but suffers from a number of short comings:

* It is ill-advised to retrieve content from an untrusted
  cache, because it can modify/corrupt the content at will.
  This severely limits the utility of cooperative caching
  systems.

* URL-based naming causes the same object on different mirrors
  to look like different objects. This decreases the efficiency
  of caching and mirroring combinations.

* There are few ways to discover optimal replicas of a given
  piece of content. There is no way for a browser to download
  a mirror list and automatically select an optimal mirror.

To add to the burden, the Transient Web is steadily growing
in size and importance. The Transient Web is embodied by
peer-to-peer systems such as Gnutella, and is characterized
by unreliable nodes and a high rate of nodes joining and
leaving the network. URL-based addressing would be unacceptable
for the Transient Web because there would be a high failure
rate of retrieving objects. 

The solution to these problems it to create a Content-Addressable
Web (CAW) that is URN-based rather than URL-based. A few
proposals have been made to enable the practical use of
URNs, such as RFC 2169 and and RFC 2915, but little has
been done with them due to lack of application demand. Recently,
however, the growing importance of peer-to-peer systems
and the desire to create ad hoc CDNs has created demand
for the Content-Addressable Web.

One of the more interesting applications of the Content-Addressable
Web is the creation of ad hoc Content Distribution Networks.
In such networks receivers can achieve tremendous throughput
by downloading content from multiple hosts in parallel.
Receivers can also crawl through the network searching for
optimal replicas, and can even retrieve content from completely
untrusted hosts but be assured that they are receiving the
content in tact. All of this is made possible by URNs.

2 Self-Verifying URNs

While any kind of URN can be used within the Content-Addressable
Web, there is a specific type of URN called a "Self-Verifying
URN" that is particularly useful. These URNs have the
property that the URN itself can be used to verify that
the content has been received intact. It is RECOMMENDED
that applications use cryptographically strong self-verifying
URNs because hosts in ad hoc CDNs and the Transient Web
are assumed to be untrusted. For instance, one could hash
the content using the SHA-1 algorithm, and encode it using
Base32 to produce the following URN:

urn:sha1:RMUVHIRSGUU3VU7FJWRAKW3YWG2S2RFB

* It is RECOMMENDED that implementations support SHA-1 URNs
  at minimum.

* Receivers MUST verify self-verifying URNs if any part of
  the content is retrieved from a potentially untrusted
  source.

A future version of this document will also specify a URN
format for performing streaming and random-access verification
using Merkle Hash Trees.

3 HTTP Extensions

In order to provide a transparent bridge between the URL-based
Web and the Content-Addressable Web, a few HTTP extensions
must be introduced. The nature of these extensions is that
they need not be widely deployed in order to be useful.
They are specifically designed to allow for proxying for
hosts that are not CAW-aware.

The following HTTP extensions are based off of the conventions
defined in RFC 2169. It is RECOMMENDED that implementers
of this specification also implement RFC 2169.

The HTTP headers defined in this specification are all response
headers. No additional request headers are specified by
this document.

It is RECOMMENDED that implementers of this specification
use an HTTP/1.1 implementation compliant with RFC 2616.

This specification uses the "X-"
header prefix convention to denote that these are not W3C/IETF
standard headers. If and when this specification becomes
a standard the prefix will either be simply removed or replaced
with an appropriate header extension mechanism.

3.1 X-Content-URN

The X-Content-URN entity-header field provides a URN that
uniquely identifies the entity-body. The URN is based on
the content of the entity-body and any content-coding that
has been applied, but not including any transfer-encoding
applied to the message-body. For example:

X-Content-URN: urn:sha1:RMUVHIRSGUU3VU7FJWRAKW3YWG2S2RFB

3.2 X-Target-URN

The X-Target-URN entity-header field provides a URN that
uniquely identifies the desired entity-body in the case
of a redirect. For HTTP 3xx responses, the URN SHOULD indicate
the server's preferred URN for automatic redirection to
the resource. 

The X-Content-URN header is inappropriate in this case, because
HTTP 3xx responses often still include message-body that
explains that a redirect is taking place.

This header primarily exists to allow the creation of URN-aware
proxies that provide URN information w/o modifying the original
web server. This allows URN-aware user-agents to take advantage
of the headers, while simply redirecting user-agents that
don't understand the Content-Addressable Web. For Example:

X-Target-URN: urn:sha1:RMUVHIRSGUU3VU7FJWRAKW3YWG2S2RFB

3.3 X-URN-N2*

These headers specify locations of various resolution services
for the URNs specified in the X-Content-URN and X-Target-URN
headers. These headers provide various ways of locating
other replicas of the content. They can be used to provide
additional sources for a multiple-source download. Or one
can build an application that crawls across the resolution
services searching for an optimal replica.

These headers are based off of conventions defined in RFC
2169 and include N2L, N2Ls, N2R, N2Rs, N2C, N2Cs, and N2Ns.
These headers provide URIs at which the associated resolution
services can be performed for the URNs specified in the
X-Content-URN and X-Target-URN headers.

It is not necessary for these URIs to conform to the "/uri-res/<service>?<uri>"
convention specified in RFC 2169.

It is believed that N2R, N2L, and N2Ls will be the most useful
services for the Content-Addressable Web, so we will cover
examples of those explicitly. The rest of the N2* headers
should be implemented using the conventions used for N2R,
N2L, and N2Ls.

Implementations can feel free to use any additional URN resolution
mechanisms, such as RFC 2915 DNS-based URN resolution.

It is RECOMMENDED that receivers assume that the URN resolver
services are potentially untrusted and should verify all
content retrieved using a resolver's services.

3.3.1 X-URN-N2R

This header specifies one or more URIs that perform the N2R
(URN to Resource) resolution service for the URNs specified
by the X-Content-URN or X-Target-URN headers. The N2R URIs
directly specify mirrors for the content addressed by the
URN and can be useful for multi-source downloads. For example: 

X-URN-N2R: http://urnresolver.com/uri-res/N2R?urn:sha1:<base32>

or

X-URN-N2R: http://untrustedmirror.com/pub/file.zip

The key difference between this header and something like
the Location header is that the URIs specified by this header
should be assumed to be untrusted.

3.3.2 X-URN-N2L and X-URN-N2Ls

This header specifies one or more URIs that perform the N2L
(URN to URL) and N2Ls (URN to URLs) resolution services.
These headers are used when other hosts provide URLs where
the content is mirrored. This is most useful in ad hoc CDNs
where mirrors may maintain lists of other mirrors. Browsers
can simply crawl across the networks, recursively dereferencing
N2L(s). For example:

X-URN-N2L: http://urnresolver.com/uri-res/N2L?urn:sha1:<base32>

and

X-URN-N2Ls: http://untrustedmirror.com/pub/file.zip-mirrors.list

For the N2Ls service, it is RECOMMENDED that the result conform
to the text/uri-list media type specified in RFC 2169.

4 An Example Application

The above HTTP extensions are deceptively simple and it may
not be readily apparent how powerful they are. We will discuss
an example application that will take advantage of a few
of the features provided by the extensions. 

In this example we will will look at how the CAW could help
at the imaginary linuxiso.org where ISO CD-ROM images of
the various linux distributions are kept. The first step
will be to issue a GET request for the content:

GET /pub/Redhat-7.1-i386-disc1.iso HTTP/1.1
Host: www.linuxiso.org 


The abbreviated response:

HTTP/1.1 200 OK
Content-Type: Application/octet-stream
Content-Length: 662072345
X-Content-URN: urn:sha1:RMUVHIRSGUU3VU7FJWRAKW3YWG2S2RFB
X-URN-N2R: http://www.linuxmirrors.com/pub/Redhat-7.1i386-disc1.iso
X-URN-N2R: http://123.24.24.21:8080/uri-res/N2R?urn:sha1:<base32>
X-URN-N2Ls: http://123.24.24.21:8080/uri-res/N2Ls?urn:sha1:<base32> 


With this response, a CAW aware browser can immediately begin
downloading the content from www.linuxiso.org, linuxmirrors.com,
and 123.24.24.21 all in parallel. At the same time the browser
can be dereferencing the N2Ls service at 123.24.24.21 to
discover more mirrors for the content.

The existence of the 123...21 host is meant to represent
a member of an ad hoc CDN, perhaps the personal computer
of a linux advocate that just downloaded the ISO and wants
to share their bandwidth with others. By dereferencing the
N2Ls, even more ad hoc nodes could be discovered.

5 Open Issues

It is unclear how to deal with the mapping of X-URN-N2* headers
in the presence of multiple X-Content-URN or X-Target-URN
headers. This must be resolved.

6 Acknowledgments

Gordon Mohr (gojomo at bitzi.com) for working on many of the
concepts in this document within the Gnutella community.
We also wish to thank Tony Kimball (alk at pobox.com) for his
continued advocacy of RFC 2169.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: caw.ps
Type: application/postscript
Size: 88595 bytes
Desc: not available
Url : http://zgp.org/pipermail/p2p-hackers/attachments/20011025/4afe354b/caw.ps


More information about the P2p-hackers mailing list