[p2p-hackers] Oblivious information dispersal?
Paul Campbell
paul at ref.nmedia.net
Sun Aug 28 18:30:31 UTC 2005
I'm not sure what the terminology is here but I've been thinking about the
whole concept of PIR and that got me going on a different tangent.
As I see it, PIR looks to be somewhat useless right now. The reason is simple.
The idea is to obscure (anonymize) the receiver of the file/information. So
PIR does this by exchanging packets in plain view. It is easy to identify the
fact that communication is occurring but not easy to identify what the
transaction is all about.
The alternative is to cloak the actual source of the transaction but the nature
of the transaction is pretty clear. That's exactly what the various proxying
schemes do. Whether implementing via the more bandwidth demanding Dining
Cryptographers route or simply via tunneling (onion routing) doesn't really
matter.
The issue at hand is that proxying only takes about 3 "hops" to cloak the
potential source as I understand it from a relatistic point of view. So since
each "hop" increases the communication by one packet, then for instance a
3-hop system increases bandwidth by 3 times. And if we consider the reverse
path (obviously we must), then double that, or 2N-fold increase in bandwidth
usage.
PIR also adds significant computational overhead and I just haven't seen a
similar head-on comparison which doesn't get into really ridiculous amounts of
bandwidth, although potentially it could be competitive.
Either way, the other issue is to obscure the "publisher" side of things...
the source of information. There is a "PIS" form of things but I'm not sure
how much it would be valid in this situation.
The goal here is to disguise the source of say a file in the database. This is
again easy to do with proxying but again adds a factor of 2N in cloaking the
path to the publisher where the endpoint of the path points to a proxy.
But even having said all that, here's where I'm going with this. Consider the
database across which the PIR or proxying shceme is operating. At some point,
we have to have a "public" part of the database...the part where you map a
query onto some piece of data. I can see a scenario where for instance a node
is accused of being an accessory to piracy...because you could easily do a
search for say "Star Wars" or "Britney Spears" and see if any of the queries
pop up your own IP address.
What I'm interested in is much further along. How can we successfully hide
database references "in plain site"? In other words, so that say the Britney
Spears file (or pointer information) is distributed out over multiple nodes
such that even if you can determine that some fraction of a bit contains a
reference to the "bad seed", the local node database does not contain a
substantial amount of information in and of itself. Thus, even if you did
determine that a piece of the offending data is somehow present, it is
pointless to delete it since redundancy gaurantees that enough of the
information is still out there that your contribution does little to nothing.
And on top of that, that plausible deniability is still maintained at some
meaningful level.
In my mind, this could potentially do two things. First, it would be yet
another route to do PIR...users could directly make queries but since the
nature of the query is spread out over a dozen or more nodes, and the bits
by themselves are fractional, it's still not possible to tell computationally
what the goal of the query is. Second, it makes plausible deniability very
easy to achieve...the database itself can't really tell what's in it's own
data since it is colluded with lots of other unrelated data and it would be
necessary to have ALL the keys to the mixed data to determine what is there.
This really only leaves an issue of disguising the tracks of the publishers
since they are the only ones who can potentially be aware of what data is
present in a given location at a given time when they mix the new information
into the system...and they can be covered again with proxying although I hope
that the query volume outstrips the publishing volume.
Anyways...is there such an animal? Does what I'm trying to say make sense?
More information about the P2p-hackers
mailing list