[p2p-hackers] Announcing a new version of DistribNet

Kevin Atkinson kevin at atkinson.dhs.org
Mon Mar 17 10:36:02 UTC 2003


I just released a new snapshot of DistribNet.  DistribNet is a Global 
peer-to-peer Internet file system in which anyone can tap into or add 
content to.  You can find it at http://distribnet.sourceforge.net/.

Compared the last version I released over a year ago this version is a 
functional network.  However it is in no way ready to be used for anything 
but testing.

I would appreciate it if other would take a look and give feedback.  I 
would especially be interested if someone which accesses to a large 
cluster can test it out as I have not actually tested if it functions over 
the network (all my testing was done by launching multiple nodes on the 
same computer).

Expect more improvements to come.


                           Release Notes

So far DistribNet has only been tested on my computer which is RedHat
8.0 (Kernel 2.4).  Other linux systems should work.  Other POSIX
systems might work.  Forget Win32 for now.  DistribNet requires:

  GNU Libc (I use a few thread safe extensions)
  Gcc 3.1 or better (Gcc 2.95 may work)
  POSIX Threads
  Berkeley DB Version 4 or better
  OpenSSL (I use version 0.9.6b)
  Perl (for the scipts)
  A Web Browser

                         Status and Future

The routing table is pretty much done.  Future versions will use a
better selection algorithm for determining which nodes to use.

Data keys are implemeted.  However, no caching is done.

Other key types are not.

The protocol is subject to change.  Absolutly no garantee will be made
that different versions of DistribNet.  The photocol will stabilize
once I get the basics done.

Absolutely no security.  Once I get all the basics done I work on
adding security, which includes encrypting all communication.

Integers are not properly converted to network order.  This will be
done at the same time the photocol is being stabilized.

Basically until I get the basics done, don't expect to be able to use
DistribNet for anything other than testing.

However, please do test it.


Since DistribNet is my Masters Theseus I will be working nearly full
time on it for the next couple of months.  However, I would defiantly
appreciate help with the implementation.

General ideas, of course, are also welcome.  Please post them to
distribnet-devel at lists.sourceforge.net and not to me directly so
others can benefit from our discussion.



Attached is a text copy of the distribnet design document.

-- 
http://kevin.atkinson.dhs.org

-------------- next part --------------
next_inactive up previous



                               DistribNet                                
                                                                         
               Kevin Atkinson (kevin at atkinson dhs org)                

            Project Page: http://distribnet.sourceforge.net/             
       Mailing list: http://lists.sourceforge.net/lists/listinfo/        
                            distribnet-devel                             

Abstract:

A global peer-to-peer Internet file system in which anyone can tap into
or add content to.

1 Overview

NOTE

This paper was initially written in response to my dislike of Freenet and
similar network that focus on complete anonymity. Thus a large deal of
the comments are directed towards that community. My ultimate goal is to
design a general purpose p2p network and not just something to replace of
Freenet. In fact in some ways DistribNet won't replace Freenet due to the
anonymity issue. I plan on eventually modifying this paper accordantly to
address the general p2p (and web) community. For now please keep the
intended audience in mind when reading this section.

1.1 Main Goal

  * To allow anyone, possibly anonymously, to publish web sites with out
    having to pay for the bandwidth for a commercial provider or having
    to put up with the increasingly ad ridden free web sites. The only
    thing the author of the web site should have to worry about is the
    contents of the web site itself.

1.2 (Possibly Impossible) Goals

  * Really fast lookup to find data. The worst case should be O(log(n))
    and the average case should be O(1) or very close to it.
  * Actually retrieving the data should also be really fast. Popular data
    should be sitting on the same subnet. On average it should be as fast
    or faster than a typical web site (such as slashdot, Google, etc.).
    It should make effective use of the topology of the Internet to to
    minimize network load and maximize performance.
  * General searching based on keywords will be build into the protocol
    from the beginning. The searching faculty will be designed in such a
    way to make message boards trivial to implement.
  * Ability to update data while keeping old revisions around so data
    never disappears until it is truly unwanted. No one person will have
    the power to delete data once it spreads throughout the network.
  * Will try very hard to keep all but the most unpopular content from
    falling off the network. Basically before deleting a locally
    unpopular key it will first check if other nodes are storing the key
    and how popular they find the key. If not enough nodes are storing
    the key and there is any indication that the data may be useful at a
    latter date it will not delete it unless it absolutely has to. And if
    it does delete it it will first try uploading it to other nodes with
    more disk space available.
  * Ability to store data indefinitely if someone is willing to provide
    the space for it (and being able to find that data in log(n) time).
  * Extremely robust so that the only way to kill the network is to
    disable almost all of the nodes. The network should still function
    even if say 90% of it goes down.
  * Extremely effect CPU-wise so that a fully functional node can run in
    the background and only take 1-2% of the CPU.

1.3 Applications

I would like the protocol to be able to effectually support (ie with out
any ugly hacks that many of the application for Freenet use)

  * Efficient Web like sites (with HTTP gateway to make browsing easy)
  * Efficient sharing of files large and small.
  * Public message forms (with IMAP gateway to make reading easy)
  * Private Email (the message will off course be encrypted so only the
    intended recipient can read it, again with IMAP gateway)

And maybe:

  * Streaming Media
  * Online Chat (with possible IRC or similar gateway)

1.4 Anti-Goals

Also see philosophy for why I don't find these issues that important

  * Complete anonymity for the browser. I want to focus first on
    performance than on anonymity. In fact I plan to use extensive
    logging in the development versions so that I track network
    performance and quickly cache performance bugs. As DistribNet
    stabilizes anonymity will be improved at the expense of logging.
   
    The initial version will only use cryptology when absolutely
    necessary (for example key signing). Most communications will be done
    in the clear. After DistribNet stabilizes encryption will slowly be
    added.
   
    Please note that I still wish to allow for anonymous posting of
    content. However, without encryption, it probably won't be as
    anonymous as Freenet or your GNUNet.
   
  * Data in the cache will be stored in a straight forward manner. No
    attempt will be made to prevent the node operate from knowing what is
    in his own cache. Also, by default, very little attempt will be made
    to prevent others from knowing what is a particular node cache.

1.5 Philosophy

  * I have nothing against complete anonymity, it is just that I am
    afraid that both Freenet and GNUNet or more designed around the
    anonymity and privacy issues then they are around the performance and
    scalability issues.
  * For most type of things the level of anonymity that Freenet and
    GNUNet offers is simply not needed. Even for copyrighted and censored
    material there is, in general, little risk in actually viewing the
    information because it is simply impractical to go after every single
    person who access forbidden information. Most all of the time the
    lawsuits and such are after the original distributors of the
    information and not the viewers. There for DistribNet will aim to
    provide anonymity for distributing information, but not for actually
    viewing it. However, since there is some information where even
    viewing it is extremely risky, DistribNet will eventually be able to
    provide the same level of anonymity that Freenet or GNUNet offers,
    but it will be completely optional.
  * I also believe that knowing what is in one owns datastore and being
    able to block certain type of material from one owns node is not that
    big of a deal. Unless almost everyone blocks a certain type of
    information the availability of blocked information will not be
    harmed. This is because even if 90% of the nodes block say, kiddie
    porn, the information will still be available on the other 10% of the
    nodes which, if the network is designed correctly, should be more
    than enough for anyone to get at blocked information. Furthermore,
    since the source code for DistribNet will be protected under the GPL
    or similar license, it will be completely impractical for other to
    force a significant number of nodes to block information. Due to the
    dynamic nature of the cache I find it legally difficult to hold
    anyone responsible for the contents of there cache as it is
    constantly changing.

2 DistribNet Architecture

Two types of keys: Map and Data keys. Maps keys are hashed based on there
identification and can be updated, Data keys are hashed based on there
content and consequently can not be updated.

There will be three type of storage of keys, Permanent, Cache, and
Pointers. Permanent keys will be used to ensure the available of content,
the cache will be used exactly like a typical cache will be used, and
pointers will be used to be be able to find content.

Map keys will be routed based on the SHA-1 hash on the identification
using a Pastry[4] like system. Data are not routed and will be stored
based on where they are retrieve. Map keys will be used to be able to
find data keys.

2.1 Key Types

There will essentially be two types of keys. Map keys and data keys. Map
keys will be uniquely identified in a similar manner as freenet SSK keys.
Data keys will be identified in a similar manner as freenet's CHK keys.

Map keys will contain the following information:

  * Short Description
  * Public Namespace Key
  * Timestamped Index pointers
  * Timestamped Data pointers

At any given point in time each map key will only be associated with one
index pointer and one data pointer. Map keys can be updated by appending
a new index or data pointer to the existing list. By default, when a map
key is queried only the most recent pointer will be returned. However,
older pointers are still there and may be retrieved by specifying a
specific date. Thus, map keys may be updated, but information is never
lost or overwritten.

Data keys will be very much like freenet's CHK keys except that they will
not be encrypted. Since they are not encrypted delta compression may be
used to save space.

There will not be anything like freenet's KSK keys as those proved to be
completely insure. Instead Map keys may be requested with out a
signature. If there is more than one map key by that name than a list of
keys is presented sorted by popularity. To make such a list meaning full
every public key in freenet will have a descriptive string associated
with it.

2.1.1 Data Key Details

Data keys will be stored in maximum size blocks of just under 32K. If an
object is larger than 32K it will be broken down into smaller size chunks
and an index block, also with a maximum size of about 32K, will be
created so that the final object can be reassembled. If an object is too
big to be indexed by one index block the index blocks themselves will be
split up. This can be done as many times as necessary therefore providing
the ability to store files of arbitrary size. DistribNet will use 64 bit
integers to store the file size therefore supporting file sizes up to
264-1 bytes.

Data keys will be retrieved by blocks rather than all at once. When a
client first requests a data key that is too large to fit in a block an
index block will be returned. It is then up the client to figure out how
to retrieve the individual blocks.

Please note that even though that blocks are retrieved individually they
are not treated as truly independent keys by the nodes. For example a
node can be asked which blocks it has based on a given index block rather
than having to ask for each and every data block. Also, nodes maintain
persistent connections so that blocks can be retrieved one after another
without having to re-establish to connection each time.

Data and index blocks will be indexed based on the SHA-1 hash of there
contents. The exact numbers of as follows:

+-------------------------------------------------------------+
| Data Block Size:                        | 215 - 128 = 32640 |
|-----------------------------------------+-------------------|
| Index block header size:                | 40                |
|-----------------------------------------+-------------------|
| Maximum number of keys per index block: | 1630              |
|-----------------------------------------+-------------------|
| Key Size:                               | 20                |
+-------------------------------------------------------------+

Maximum object sizes:

    direct   => 214.99 bytes , about 31.9 kilo
   
    1 level  => 225.66 bytes , about 50.7 megs
   
    2 levels => 236.34 bytes , about 80.8 gigs
   
    3 levels => 247.01 bytes , about 129 tera
   
    4 levels => 257.68 bytes
   
    5 levels => 268.35 bytes (but limited to 264 - 1)

Why 32640?

A block size of just under 32K was chosen because I wanted a size which
will allow most text files to fit in one block, most other files with one
level of indexing, and just about anything anybody would think of
transferring on a public network in two levels and 32K worked out
perfectly. Also, files around 32K are rather rare therefor preventing a
lot of of unnecessary splitting of files that don't quite make it. 32640
rather than exactly 32K was chosen to allow some additional information
to be transfered with the block without pushing the total size over 32K.
32640 can also be stored nicely in a 16 bit integer without having to
worry if its signed or unsigned.

However, the exact block size is not fixed in stone. If, at a latter
date, a different block size is deemed to be more appropriate than this
number can be changed....

2.2 Storage

Permanent keys will be distributed essentially randomly. However, to
insure availability the network will insure at least N nodes contain the
data. Nodes which are responsible for maintaining a permanent key will
know about all the other nodes on the network with are also responsible
for that key. From time to time it will check up on the other nodes to
make sure they are still live and if less than N-1 other nodes are live
it will pick another node to to ask to maintain a copy of the key. It
will first try nodes which already have the key in its cache and if they
all refuse or none of them do. It will chose a random node to ask and
will keep trying until some node accepts or one the original nodes
becomes live again.

Cached keys will be DistribNet based on where it will do the most good
performance wise. How cache keys will be managed is still undecided. For
the first implementation it will likely be stored on the nodes which have
previously requested the key.

Pointer keys will basically be distributed based on the numeric distance
of the hash of the key from the hash of the node's identification. Since
pointer keys contain very little data they will be an extremely large
amount of redundancy. Pointer keys will contain two types of pointers.
Pointers to permanent keys and pointers to permanent keys. Pointer keys
on different nodes will all contain the same permanent pointers but will
only contain pointers to cached keys to nodes which or near by. There
will be an upper limit to the number of pointers within an pointer key
any one node will have.

3 DistribNet Routing

Map keys will be routed based on the SHA-1 hash of their identification
based in a similar manor as done in Pastry[4]. This section will assume
the reader is familiar with how Pastry works and will focus on how
DistribNet differs from Pastry.

Each node on DistribNet is uniquely identifies by the 160-bit SHA-1 hash
of the public key. Since SHA-1 hashes are used the nodes will be evenly
distributed. Keys in DistribNet are stored based on bitwise closeness.
Bitwise closeness is based on the number of common bits two keys have.
Unlike Pastry, numerical closeness is generally not used.

The routing contains 8 rows with each row containing 24 entries each. In
general, DistribNet tries to maintain at least two nodes for each entry.
The number of rows does not need to be fixed and it can change based on
the network size. It may also be possible that the number of entries per
row does not necessarily have to be fixed. However, This idea has not
been exported in. 4 was chosen as the base size for several reasons 1) it
is a power of two, 2) when keys are thought of as a sequence of digits a
base size of 4 means that the digits will be hexadecimal, 3) the Pastry
paper hinted that 4 would be a good choice. The number of rows was chosen
to be large enough so that there is no possibility that the last row will
be used when dealing with a moderate size network during testing.

Unlike Pastry their is no Leaf set. Instead the ``leaf set'' consists of
all rows which are not ``full''. A full row is a row which contains 15
full entries with extra empty entry being the one which represents the
common digit with the node's key, and thus will never be used. Not having
a true ``lead set'' simplifies the implementation since a seperate list
does not need to be maintianed. This also means that all the nodes in the
leaf set will maintain the same set. I have not determened if this is a
good or bad thing.

A row is considered full in DistribNet if 15 of the 16 entries are full
in the current node AND other nodes also have 15 of the 16 entries full
(clarify...). For each full row DistribNet will try to maintain at least
two nodes for each entry. This way if one node goes down the other one
can be used without effecting performance. When a node is determined to
be down (as oppose to being momentary offline) DistribNet will try to
replace it with another node that is up. With this arraignment is is
extremely likely that at least one on the two nodes will be available. A
full row can become a leaf row if the entry count drops below 15.

For each non-full row (ie in the Leaf Set) DistribNet will attempt to
maintain as many nodes as are available for that entry so that every
other node in the leaf set is accounted for. From time to time DistribNet
will contact another node in the leaf set and synchronize its leaf set
with it. This is possible because all nodes in the leaf set will have the
same set. Down nodes in the leaf set will be removed, but the criteria
for when a node is down for a leaf set is stricter than the criteria for
a full row. If a lead row becomes a full row than excess nodes will be
removed.

DistribNet also maintains an accurate estimate on the number of nodes
that are on the network. This is possible because unlike with network
such as freenet, all nodes are accounted for.

To store a Map key the 3 bitwise closest nodes will get it.

When looking for a key the 8 closest nodes will be tried.

The routing table is implemented in the files routeing_*.?pp in the src/
directory of the distribution.

4 Retrieval of Data keys

Each key request is coupled by a hops-to-try HTT request. This node
controls the number of additional nodes that can be contacted to retrieve
the data. If the HTT number is 0 than the request will fail unless it has
the node. This number is only for actually retrieving the data, not
finding it.

When a node A wants to retrieve a key K either two things will happen. If
it has good reason to believe that a nearby node has the key it will
attempt to retrieve it from that node. otherwise it will send a request
to get other nodes which have the key. If a nearby node has the key it
will ask the that node for the key. If it doesn't it will ask some nearby
to do so on its behave.

To find a key ... To do this it will contact node B which will in tern
contact C etc, until an answer is found which for the sake of argument
will be node E. Node E will then send a list of possible nodes L which
contain key K directly to node A. Node E will then send the result to
node D, which will send it to C, etc. Node E will also add node A to list
L with probability of say 10%, Node D will do the same but with a
probability of say 25%, etc. This will avoid the problem having the list
L becomes extremely large for popular data but allow nodes close to A to
discover that A has the data since nodes close to A will likely contact
the same nodes that A tried. Since A requested the location of key K it
is assumed that K will will likely download the data. If this assumption
is false than node A will simply be removed at the list latter on.

Once A retrieves the list it will pick a node from the list L based on
some evaluation function, lets say it picks node X. Node X will then
return the key to node A. The evaluation function will take several
factors, into account, including distance, download speed, past
reputation, and if node A even knows anything about the new node.

If node X does not send the key back to node A for what ever reason it
will remove node X from the list and try again. It will also send this
information to node B so it can consider removing node X from its list,
it will then in term notify node C of the information, etc. If the key is
an index block it will also send information of what parts of the
complete key node X has. If the key is not an index block than node a is
done.

If the key is an index block than node A will start downloading the
sub-blocks of key K that node X has. At the same time, if the key is
large or node X does not contain all the sub-blocks of K, node X will
chose another node from the list to contact, and possible other nodes
depending on the size of the file. It will then download other parts of
the file from the other nodes. Which blocks are download from which nodes
will chance based on the download speed of the nodes so that more blocks
are download from faster nodes and less from slower, thus allowing the
data to be transfered in the least amount of time. If after contacting a
certain number of nodes there are still parts of the key that are not
available on any of those nodes, node A will perform a separate query for
the individual blocks. However, I image, in practice this will rarely be
necessary.

4.1 Distance determination

One very course estimate for node distance would be to take the numerical
distance between two nodes ip address since nodes closer to easy other
numerically are likely to be share the same gateways and nodes really
close are likely to be on the same subnet.

Another way to estimate node distance releases on the the fact that node
distance, for the most part, obeys the triangle inequality. For each node
in the list of candidate nodes some information about the estimated
distance between that node, node E, in the list and the node storing the
list is maintained by some means. For node A to estimate the distance
between a node on the list, node X, and itself all it has to do is
combine the distance between it and E with the distance between E and X.
The combination function will depend on the aspect of distance that is
being measured. For the number of hops it will simply add them, for
download speed it will take the maximum, etc.

5 Limitations

Because they is no indirection when retrieving data most of the data on
any particular node would be data that a local node user requested at
some point in time. This means that it is fairly easy to tell what which
keys a particular user requested. Although complete anonymity for the
browser is one of my anti-goals this is going a bit to far. One solution
for this is to do something similar that GNUNet does which is described
in [3].

It is also blatantly obvious which nodes have which keys. Although I do
not see this as a major problem, especially if a solution for the first
problem is found, it is something to consider. I will be more than happy
to entertain solutions to this problem, provided that it doesn't effect
effectually that much.

6 Implementation Details

An implementation for DistribNet is available at http://
distribnet.sourceforge.net/.

6.1 Physical Storage

Blocks are currently stored in one of three ways

 1. block smaller than a fixed threshold (currently 1k) are stored using
    Berkeley DB (version 3.3 or better).
 2. blocks larger than the threshold are stored as files. The primary
    reason for doing this is to avoid limiting the size of data store by
    the maximum size of a file which is often 2 or 4 GB on most 32-bit
    systems.
 3. blocks are not stored at all, instead they are linked to an external
    file out side of the data store much like a symbolic link links to
    file out side of the current directory. However since blocks often
    only represent part of the file the offset is also stored as part of
    the link. These links are stored in the same database that small
    blocks are stored in. Since the external file can easily be changed
    by the user, the SHA-1 hashes will be recomputed when the file
    modification data changes. If the SHA-1 hash of the block differs all
    the links to the file will be thrown out and the file will be
    relinked. (This part is not implemented yet).

Most of the code for the data keys can be found in data_key.cpp

6.2 Language

DistribNet is/will be written in fairly modern C++. It will use several
external libraries however it will not use any C++ specific libraries. In
particular I have no plan to use any sort of Abstraction library for
POSIX functionally. Instead thin wrapper classes will be used which I
have complete control over and will serve mainly to make the process of
using POSIX functions less tedious rather than abstract away the details
of using them.

Bibliography

   
1   GNUNet. http://www.ovmj.org/GNUnet/ and http://www.gnu.org/software/
    GNUnet/
   
2   Freenet. http://freenet.sourceforge.net/
   
3   Krista Bennett and Christian Grothoff. ``GNUnet - anonymity for
    free''. http://gecko.cs.purdue.edu/GNUnet/papers.php3
   
4   Antony Rowstron and Peter Druschel. ``Pastry: Scalable, decentralized
    object location and routing for large-scale peer-to-peer systems''.
    http://research.microsoft.com/ antr/Pastry/pubs.htm

About this document ...

DistribNet

This document was generated using the LaTeX2HTML translator Version 2002
(1.62)

Copyright © 1993, 1994, 1995, 1996, Nikos Drakos, Computer Based Learning
Unit, University of Leeds.
Copyright © 1997, 1998, 1999, Ross Moore, Mathematics Department,
Macquarie University, Sydney.

The command line arguments were:
latex2html -no_subdir -split 0 -show_section_numbers /tmp/
lyx_tmpdir1352U04iwM/lyx_tmpbuf0/design.tex

The translation was initiated by Kevin Atkinson on 2003-03-17
-------------------------------------------------------------------------
next_inactive up previous

Kevin Atkinson 2003-03-17


More information about the P2p-hackers mailing list