please prefer base 64 over base 32 (was: Re: [p2p-hackers] Bitzi (was Various identifier choices))

Gordon Mohr gojomo at usa.net
Wed Sep 19 11:55:01 UTC 2001


Zooko writes:
> > But Base64 introduces case-sensitivity. Especially if
> > you ever use identifier fragments as a shorthand, this
> > introduces situations where they "bleed together" --
> > in human perception, in filesystems, in search-routines.
> 
> Hm.
> 
> Since mojoids include 160-bit SHA1 hashes, they are collision free *even* if
> you base64 encode them and then merge all the upper/lowercase!  (That isn't
> obvious, but once upon a time I convinced myself of it.  I can find the message
> in old mojonation-devel archives if you like.)

I'd prefer a pointer to the MojoID definition document!

> Hm.  I'm pretty sure that using fragments as shorthand opens the door to
> collisions all by itself, and that the upper/lowercase issue doesn't contribute
> significantly to the risk of spoofing.
> 
> Can you give me an example of this "bleed together" problem, excluding using
> fragments?

With a filesystem or file-management program which ignores
or normalizes casing, Base64 names can suffer damage. 

This might then cause you to not find real matches (on a
case-sensitive basis). Or, it might tempt you to use 
case-insensitive searches, and then you've lost ~21 bits from 
your secure hash (as Hal Finney mentioned). 

Alternatively, if you wanted to rely on legacy case-insensitive
full-text search to find file identifiers, you'd be 
introducing a step where your identifiers are 21 bits weaker.

(Try, for example, Googling for SMF2Y24TI7Y3CVER8NJKT7CAFGR9FS7Z.
Googling for a Base64 identifier would introduse)

These problems are further aggravated if you ever find it 
useful to use fragments.  We already have 34 versions of
'Uptown Girl' in the Bitzi database. Given human perception,
I think it's easier for people to say or think things like:

  "The 'PYME' version is complete, the '43N6' version is
   truncated."

...than to say...

  "The 'bB/e' version is complete, the 'B+vb' version is
   truncated."

> > Also, Base64 introduces 2 characters that can present
> > problems in URLs and filenames: '/' and '+'.
> 
> I should have specified that we translate `+' and `/' to `-' and `_'
> respectively.

MojoNation and Freenet should get together and use the same
"Base64v2".

> > In contrast, Base32 is robust across case isomorphisms,
> > safe for URLs and filesystems, and results in full-length
> > and fragment identifiers which are typically recognized
> > as unbroken units by legacy text-search mechanisms.
> 
> I guess we just differ in our value judgements here.  I value shorter ids for
> cut-and-paste purposes more than I value absence of "break" characters.
> Indeed, I can't really think of a motivating example for caring about "break"
> characters.  Could you please suggest one?

Again, Googling for identifiers. Other full-text searches for
fragments. Searching for the Base32 fragment 'B6THNJ' is always
a single word; searching for the Base64 fragment 'aS+w/e' might
be interpreted as 'as w e' and perhaps ignored completely.

> Hm.  I can't find a base-32 encoder in Python.  Could someone who favors
> base-32, and thus presumably has an encoder handy, show the base-32 version of
> 40-byte, 30-byte, and 20-byte strings?  Thanks!

20b -> 32 chars: 3KIZIJB64XP3NCXAE4ISQZT3QNCTF7VD
30b -> 48 chars: 3KIZIJB64XP3NCXAE4ISQZT3QNCTF7VD8EJ2KEDCV3WQMMPF
40b -> 64 chars: 3KIZIJB64XP3NCXAE4ISQZT3QNCTF7VD8EJ2KEDCV3WQMMPFWFJW6DCVPKXMZQIZ

- Gojomo




More information about the P2p-hackers mailing list