[p2p-hackers] SHA1 broken?

Gordon Mohr ( at Bitzi) gojomo at bitzi.com
Thu Feb 17 04:12:18 UTC 2005


Serguei Osokine wrote:
> On Wednesday, February 16, 2005 Gordon Mohr wrote:
> 
>>MD5 should not be used for content identification, given the 
>>ability to create content pairs with the same MD5, with one 
>>version being (and appearing and acquiring a reputation for 
>>being) innocuous, and the other version malicious.
> 
> 
> 	Right. So let's go and try to find something with the same
> MD5 as this letter of mine, shall we? :-)

I can't -- but you could have made a collision, very easily, if
you composed your initial message with the intent of also composing
an MD5 twin at the same time.

That means for content identification MD5 is fatally flawed. For
any file whose contents I think I know and trust, perhaps based
on analysis and history of the file, there could be another
dangerous file with the same MD5. MD5 cannot be used to distinguish
between the two, but that's the whole point of using a secure
hash for content identification.

Dan Kaminsky runs over a number of potential attacks that
are relevant to P2P -- see:

   http://paketto.doxpara.com

Don't be fooled by the title of his analysis, "MD to be considered
harmful someday" -- the attacks mentioned are possible now, and
could trick people and software in subtle ways different from
other threats to P2P nets.

Here's another example from the cryptography list that convinced
a  doubter that the attacks on MD5 were of more than purely
theoretical interest: two long binary strings, one a prime number,
one not:

   http://lists.virus.org/cryptography-0412/msg00102.html

Consider source code or executables which work fine with the
primes, s-boxes, and other initialization vectors initially
examined -- but have exploitable flaws when those values are
perturbed in a manner that leaves the MD5 the same. You need
to use a different, stronger content check to prevent such
mischief -- making the use of MD5 redundant and even dangerous
for the false sense of security it gives.

> 	For any practical purpose that I can imagine in a content
> identification field, MD5 is just fine. And SHA-1 is even more
> fine. 

If you can't imagine exploits, perhaps it's just a failure of
your imagination. Prudent engineering would assume some attackers
have better imaginations than you, when it comes to exploiting
hashes that don't work as originally intended.

> There are plenty more simple ways to attack the CDN nets
> than MD5 collisions. Way more simple. And abandoning MD5 for
> SHA1, then SHA1 for Tiger, and then abandoning Tiger for some
> newer hash when some researcher finds that it is really twenty
> bits weaker than you thought - it is all just a huge waste of
> development effort, as far as I'm concerned.

Depends on the kinds of attacks you're worried about. There
are more simple ways to disrupt P2P nets, sure. But are there
more simple ways to trick conscientious, hash-checking users
into running malware?

And since when did the ease of other attacks become an excuse
for ignoring more complicated and subtle (and thus perhaps
more valuable) attacks?

If you need a secure hash's properties in your software, you
should use an uncompromised secure hash. (Results as early as
1996 suggested MD5 should not be used in applications where
collision-resistance is important.)

If you're stuck with a legacy hash, fine, analyze the situation
and if you're confident the weakness has no effect on current
usage, rationalize using it a while longer. But get ready for
the potential need to switch hashes quickly in the presence of
further discoveries. Or better yet: design with the idea in mind
that no hash function lives forever.

- Gordon @ Bitzi



More information about the P2p-hackers mailing list