[p2p-hackers] Re: Byzantine Quorum Systems

Nick Szabo szabo at szabo.best.vwh.net
Wed Feb 23 01:32:05 UTC 2005


My proposal for a fully decentralized p2p directory to protect name integrity
was somewhat vague; hopefully this will clarify things a bit:

http://szabo.best.vwh.net/nameintegrity.html

> Nick, I've been reading over the papers on your website and want to know 
> your opinion of how well byzantine quorum systems work under the following 
> conditions:

I'm by no means an expert on the network conditions you describe, but
I'll do my best.

> * A majority of the nodes have a half-life of about an hour (i.e. every 
> hour half the nodes in the p2p system leave).  A large number of nodes 
> never rejoin the system again.

This is where Byzantine fault tolerance shines.  The main security and 
reliability property holds per transaction.  Byzantine fault-tolerant 
systems are vastly superior to any reputation- or history-based precaution 
in this regard.  The most common transaction for the proposed directory is 
the propagation of a new (filename, hash, owner) tuple.  

The Byzantine attacker tries to overwhelm the directory with new, 
separate-looking nodes which forge messages, corrupting the directory for 
everybody.  If the attacker can usurp more than a certain fraction (1/4 
for some kinds of quorum systems, 1/3 for traditional slow Byzantine fault 
tolerant systems) of the separate-looking nodes during a given transaction, 
they can compromise the integrity of that transaction.  The fraction 
is far smaller for non-Byzantine tolerant directories -- in a 
typical p2p system, you can't demonstrate that the directory is 
safe against even a very small number of message-forging nodes.

> * The system has high latency (i.e. it is running over the public Internet)

For the directory, we just want short tuples like (file name, file hash, 
owner name, logo, digital signature) to propagate about as fast as the 
large files they refer to.  How bad slowing down overall file+directory
propagation is a problem depends on the relative values users and 
uploaders place on propagation time, uploader reputation, and file name 
integrity.  The larger number of messages required for a Byzantine quorum 
system is probably tolerable, as it is for the supernodes in OceanStore.   

> * A majority of the nodes are NATed or firewalled, depending on other nodes 
> to relay requests.

All kinds of directories would benefit from cryptography to prevent 
substitution and replay attacks by intermediaries like these and others, 
to be sure.  This reduces the intermediaries' attack to making all the
nodes behind them simply fail, and fail-stop is a simpler problem to solve 
than Byzantine faults (message forging & the like).  If the inside nodes 
fail and outside nodes can't reroute their messages to the failed nodes 
through another NAT or firewall, the correct nodes cut the failed nodes 
out of the directory, and the integrity of the directory is not harmed.

> My understanding of byzantine quorum algorithms are that they break down 
> under the kinds of conditions found in large-scale, P2P systems, which have 
> very high node churn and high latency, described above.  They seem to be 
> focused on very stable or LAN type networks.  Is this a correct 
> assumption?  If it is, it seems that byzantine quorum algorithms need to be 
> refocused on the kinds of networks that we are dealing with today...

Actually, the original focus of the research in the 70's was even
more fixed -- it was about ensuring the reliability of computers with
multiple redundant CPUs.  The best way to prove such reliability was to make
a very open-ended assumption that CPUs would have not just statistical
errors but arbitrary, even malicious errors -- so the models became
even more pertinent the security and reliability of LANs, and more 
pertinent still to the security and reliability of wide-area distributed 
systems.  The CPUs (and LAN nodes) were also deemed to be highly 
unstable -- there was a separate fail-stop model where the idea was
to simply tolerate high numbers of failures that could (unlike Byzantine
attacks) be detected and ignored as simple failures by other nodes.
Both models are very well studied, and chances are the combination of
Byzantine and fail-stop models is well-studies as well, though I haven't 
personally researched that angle.

I heartily agree that more adaptations and improvements to an unstable
environment should be explored.  For example, a combination of fail-stop
(to deal with very large numbers of detectable failures from catastrophic
exit or blockage of many p2p nodes at once) and Byzantine fault tolerance 
(to deal with a smaller numbers of simultaneous message-forging nodes) 
could be explored.  Such exploration should not, however, come at the 
expense of losing the provable security and reliability properties 
Byzantine fault-tolerance discipline makes possible for fully decentralized 
directories.

Nick Szabo



More information about the P2p-hackers mailing list