please prefer base 64 over base 32 (was: Re: [p2p-hackers] Bitzi (was Various identifier choices))

Zooko zooko at zooko.com
Wed Sep 19 11:01:02 UTC 2001


> But Base64 introduces case-sensitivity. Especially if
> you ever use identifier fragments as a shorthand, this
> introduces situations where they "bleed together" --
> in human perception, in filesystems, in search-routines.

Hm.

Since mojoids include 160-bit SHA1 hashes, they are collision free *even* if
you base64 encode them and then merge all the upper/lowercase!  (That isn't
obvious, but once upon a time I convinced myself of it.  I can find the message
in old mojonation-devel archives if you like.)

Hm.  I'm pretty sure that using fragments as shorthand opens the door to
collisions all by itself, and that the upper/lowercase issue doesn't contribute
significantly to the risk of spoofing.

Can you give me an example of this "bleed together" problem, excluding using
fragments?


> Also, Base64 introduces 2 characters that can present
> problems in URLs and filenames: '/' and '+'.

I should have specified that we translate `+' and `/' to `-' and `_'
respectively.


> In contrast, Base32 is robust across case isomorphisms,
> safe for URLs and filesystems, and results in full-length
> and fragment identifiers which are typically recognized
> as unbroken units by legacy text-search mechanisms.

I guess we just differ in our value judgements here.  I value shorter ids for
cut-and-paste purposes more than I value absence of "break" characters.
Indeed, I can't really think of a motivating example for caring about "break"
characters.  Could you please suggest one?


> > A mojoid in base-32 would look like this:
> > 
> > http://localhost:4004/id/1b17864eeb6c68294c9b2db0324a2b773401f0da0537d82626c24a7850e15ef2d6c4265dcd5e85f1
> 
> That looks like Hexadecimal to me; the chance that a 70-digit Base32
> number would contain no letters G-Z is infinitesimal.

Ahem.  <blush>

Okay, that was hexidecimal.  The standard Python libraries offer hex and
base-64, and in my haste I mistook hex for base-32.

Hm.  I can't find a base-32 encoder in Python.  Could someone who favors
base-32, and thus presumably has an encoder handy, show the base-32 version of
40-byte, 30-byte, and 20-byte strings?  Thanks!


Regards,

Zooko




More information about the P2p-hackers mailing list