Base85 encoding

It’s 85 becauselog85(232) = 4.

993andlog84(232) = 5.


(If you’re not comfortable with logarithms, see an alternate explanation in the footnote [1].

)Now Base85 is different from the other bases I’ve written about because it only works on 4 bytes at a time.

That is, if you have a number larger than 4 bytes, you break it into words of 4 bytes and convert each word to Base 85.

Character setThe 95 printable ASCII characters are 32 through 126.

Base 85 uses characters 33 (“!”) through 117 (‘u’).

ASCII character 32 is a space, so it makes sense you’d want to avoid that one.

Since Base85 uses a consecutive range of characters, you can first convert a number to a pure mathematical radix 85 form, then add 33 to each number to find its Base85 character.

ExampleSuppose we start with the word 0x89255d9, equal to 143807961 in decimal.

143807961 = 2×854 + 64×853 + 14×852 + 18×85 + 31and so the radix 85 representation is (2, 64, 14, 18, 31).

Adding 33 to each we find that the ASCII values of the characters in the Base85 representation are (35, 97, 47, 51, 64), or (‘#’, ‘a’, ‘/’, ‘3’, ‘@’) and so #a/3@ is the Base85 encoding of 0x89255d9.

Z85The Z85 encoding method is also based on a radix 85 representation, but it chose to use a different subset of the 95 printable characters.

Compared to Base85, Z85 adds seven characters    v w x y z { }and removes seven characters    ` ” _ , ;to make the encoding work more easily with programming languages.

For example, you can quote Z85 strings with single or double quotes because neither kind of quote is a valid Z85 character.

And you don’t have to worry about escape sequences since the backslash character is not part of a Z85 representation.

GotchasThere are a couple things that could trip someone up with Base85.

First of all, Base 85 only works on 32-bit words, as noted above.

For larger numbers it’s not a base conversion in the usual mathematical sense.

Second, the letter z can be used to denote a word consisting of all zeros.

Since such words come up disproportionately often, this is a handy shortcut, though it means you can’t just divide characters into groups of 5 when converting back to binary.

Related postsBase32 and Base64Base58[1] 954 = 81450625 < 232 = 4294967296, so four characters from an alphabet of 95 elements is not enough to represent 232 possibilities.

So we need at least five characters.

855 = 4437053125 > 232, so five characters is enough, and in fact it’s enough for them to come from an alphabet of size 85.

But 845 = 4182119424 < 232, so an alphabet of 84 characters isn’t enough to represent 32 bits with five characters.


. More details

Leave a Reply