[p2p-hackers] Re: [decentralization] The Content-Addressable Web

Mark Baker distobj at acm.org
Thu Oct 25 08:35:01 UTC 2001


Justin,

> The goal of the Content-Addressable Web (CAW) is to create
> a URN-based Web that can be optimized for content distribution.

Ooooh, noooo, not URNs again. 8-)

> The use of URNs allows advanced caching techniques to be
> employed, and sets the foundation for creating ad hoc Content
> Distribution Networks (CDNs).

Untrue.  The use of URIs allows these things.  There is nothing
special about URNs in this respect.

> Standard web caching can provide significant benefits in
> certain situations, but suffers from a number of short comings:
> 
> * It is ill-advised to retrieve content from an untrusted
>   cache, because it can modify/corrupt the content at will.
>   This severely limits the utility of cooperative caching
>   systems.

Not to my knowledge.  Either you trust the provider of the cache,
or you don't.  If you do, you can share.

> * URL-based naming causes the same object on different mirrors
>   to look like different objects.

Incorrect.  It is current practice in mirroring that is at fault,
not the URL mechanism.  More specifically, mirroring is implemented
with "copy" semantics, rather than "cache", necessitating the
creation of new URIs.

> This decreases the efficiency
>   of caching and mirroring combinations.

Mirroring practice decreases the efficiency of mirroring.

> * There are few ways to discover optimal replicas of a given
>   piece of content. There is no way for a browser to download
>   a mirror list and automatically select an optimal mirror.
> 
> To add to the burden, the Transient Web is steadily growing
> in size and importance. The Transient Web is embodied by
> peer-to-peer systems such as Gnutella, and is characterized
> by unreliable nodes and a high rate of nodes joining and
> leaving the network. URL-based addressing would be unacceptable
> for the Transient Web because there would be a high failure
> rate of retrieving objects. 

The Web doesn't deal in "nodes", it deals in resources
identified by authorities.  That Gnutella treats each node
as a separate authority, thereby creating an unbounded
number of identities for a single resource, is a problem
of Gnutella's, not the Web.

> One of the more interesting applications of the Content-Addressable
> Web is the creation of ad hoc Content Distribution Networks.
> In such networks receivers can achieve tremendous throughput
> by downloading content from multiple hosts in parallel.
> Receivers can also crawl through the network searching for
> optimal replicas, and can even retrieve content from completely
> untrusted hosts but be assured that they are receiving the
> content in tact. All of this is made possible by URNs.

It's made possible by URIs, not URNs.

> 2 Self-Verifying URNs
> 
> While any kind of URN can be used within the Content-Addressable
> Web, there is a specific type of URN called a "Self-Verifying
> URN" that is particularly useful. These URNs have the
> property that the URN itself can be used to verify that
> the content has been received intact. It is RECOMMENDED
> that applications use cryptographically strong self-verifying
> URNs because hosts in ad hoc CDNs and the Transient Web
> are assumed to be untrusted. For instance, one could hash
> the content using the SHA-1 algorithm, and encode it using
> Base32 to produce the following URN:
> 
> urn:sha1:RMUVHIRSGUU3VU7FJWRAKW3YWG2S2RFB

That's an invalid URN, AFAIK.  There's no authority.  All URIs
need an authority to vouch for the identity.

There's "urn:ietf:sha-1" that identifies the SHA-1 algorithm,
but that namespace doesn't allow further qualification of the URN.

Also, including a hash of the content in a URI is an extremely
brittle way of identifying things, unless it is known that
the content will be static for all time and space.  If that URN
identified a song in MP3 format, for example, then you'd need
a new URN to identify the same song in a different format.
Is that what you want?

> 3.1 X-Content-URN
> 
> The X-Content-URN entity-header field provides a URN that
> uniquely identifies the entity-body. The URN is based on
> the content of the entity-body and any content-coding that
> has been applied, but not including any transfer-encoding
> applied to the message-body. For example:
> 
> X-Content-URN: urn:sha1:RMUVHIRSGUU3VU7FJWRAKW3YWG2S2RFB

Content-Location should suffice.

> 3.2 X-Target-URN
> 
> The X-Target-URN entity-header field provides a URN that
> uniquely identifies the desired entity-body in the case
> of a redirect. For HTTP 3xx responses, the URN SHOULD indicate
> the server's preferred URN for automatic redirection to
> the resource. 

HTTP redirection allows an authority of a resource to specify
that the resource is now found elsewhere.  What could a client
tell a server that it doesn't already know?  What you're
specifying here isn't redirection - I don't know what it is.

> This header primarily exists to allow the creation of URN-aware
> proxies that provide URN information w/o modifying the original
> web server. This allows URN-aware user-agents to take advantage
> of the headers, while simply redirecting user-agents that
> don't understand the Content-Addressable Web. For Example:

Why not just convert URLs to URNs;

http://www.markbaker.ca/foo/bar/baz

-> urn:markbaker.ca:foo:bar:baz

> It is believed that N2R, N2L, and N2Ls will be the most useful
> services for the Content-Addressable Web, so we will cover
> examples of those explicitly. The rest of the N2* headers
> should be implemented using the conventions used for N2R,
> N2L, and N2Ls.

The N2* conventions run completely against the architecture of the Web.
URIs are resource identifiers.  URNs are one kind of URI.  How many
URIs does a resource need?

MB
--
Mark Baker, CSO, Planetfred.
Ottawa, Ontario, CANADA.
mbaker at planetfred.com



More information about the P2p-hackers mailing list