please prefer base 64 over base 32 (was: Re: [p2p-hackers] Bitzi (was Various identifier choices))
Zooko
zooko at zooko.com
Wed Sep 19 11:01:02 UTC 2001
> But Base64 introduces case-sensitivity. Especially if
> you ever use identifier fragments as a shorthand, this
> introduces situations where they "bleed together" --
> in human perception, in filesystems, in search-routines.
Hm.
Since mojoids include 160-bit SHA1 hashes, they are collision free *even* if
you base64 encode them and then merge all the upper/lowercase! (That isn't
obvious, but once upon a time I convinced myself of it. I can find the message
in old mojonation-devel archives if you like.)
Hm. I'm pretty sure that using fragments as shorthand opens the door to
collisions all by itself, and that the upper/lowercase issue doesn't contribute
significantly to the risk of spoofing.
Can you give me an example of this "bleed together" problem, excluding using
fragments?
> Also, Base64 introduces 2 characters that can present
> problems in URLs and filenames: '/' and '+'.
I should have specified that we translate `+' and `/' to `-' and `_'
respectively.
> In contrast, Base32 is robust across case isomorphisms,
> safe for URLs and filesystems, and results in full-length
> and fragment identifiers which are typically recognized
> as unbroken units by legacy text-search mechanisms.
I guess we just differ in our value judgements here. I value shorter ids for
cut-and-paste purposes more than I value absence of "break" characters.
Indeed, I can't really think of a motivating example for caring about "break"
characters. Could you please suggest one?
> > A mojoid in base-32 would look like this:
> >
> > http://localhost:4004/id/1b17864eeb6c68294c9b2db0324a2b773401f0da0537d82626c24a7850e15ef2d6c4265dcd5e85f1
>
> That looks like Hexadecimal to me; the chance that a 70-digit Base32
> number would contain no letters G-Z is infinitesimal.
Ahem. <blush>
Okay, that was hexidecimal. The standard Python libraries offer hex and
base-64, and in my haste I mistook hex for base-32.
Hm. I can't find a base-32 encoder in Python. Could someone who favors
base-32, and thus presumably has an encoder handy, show the base-32 version of
40-byte, 30-byte, and 20-byte strings? Thanks!
Regards,
Zooko
More information about the P2p-hackers
mailing list