My Welsh Blog

Base64 Encoding in Delphi

Programming

One of the things that's come up several times in my Delphi experience is the need to Base 64 encode and decode data. Sadly, there's no built-in function to do this, but as Base-64 is a very straightforward encoding, it's a pretty trivial task to write your own.

Base-64 is a means of encoding binary data using the 62 characters comprising the lowercase alphabet, the uppercase alphabet and the numbers 0-9, as well as the characters + and / for a total of 64 different characters, plus = used for padding.
Why 64? Because the number 63 is the highest number that can be represented in 6 bits. 63 allows for 64 positions in a zero-indexed array, and as we'll see in a minute, the use of 6 bits is crucial to how Base-64 works.

The basic principle is this: take three 8-bit values and turn them into four 6-bit values. These values are used to index an array of the character set mentioned above, and the characters then form the output string.

For example, take the string "Cat". Assuming an ASCII encoding, this comprises the values 67, 97, and 116, the values for uppercase C, lowercase A and lowercase T, respectively. These are represented in binary thus:

01000011 01100001 01110100

Divided into groups of 6:

010000 110110 000101 110100

These become the numbers

16, 54, 5, and 52

And if we use these to index into the following array (zero-indexed):

ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/

We get the characters:

Q2F0

Decoding is just finding the index of the characters, converting those values to binary, grouping the bits in 8s and converting back to numerical values.

Earlier we said we were converting groups of three 8-bit bytes into groups of four 6-bit bytes, but what if the input data isn't divisible by four? Simple, we pad any partial groups with zeros and use the padding character = for any output characters that are missing entirely.

So, if our input was "Ca" instead of "Cat" we'd have input bytes of:

01000011 01100001

Grouped into sixes, we finish off the last group with zeros:

010000 110110 000100

Convert to the character sequence as before, and append the padding character:

Q2E=

Simple, right?

So why do we need the padding character? Even without it, we'd be able to unambiguously decode back to our original data. But what if we concatenated several base-64 values one after the other? If we couldn't guarantee that each one was a multiple of four we could find ourselves in the situation where the last bits of one string are concatenated to the first bits of another string and produce a string that is different to the input from that point on. The padding characters remove that possibility.

So how do we do this in Delphi?

Well, first it's important to remember that Base-64 can be used to encode **any** binary data, not just character strings, so it makes sense that the input for our encoding function is an array of bytes. If we do happend to want to encode a string, we can get the appropriately encoded byte values back from the string using the TEncoding class.
Then we loop through the string taking three characters at a time, keeping track of how many bits are left to read, and converting them to 6-bit bytes.

So far:

function Base64EncodeBytes(inputstring : TBytes) : string; const base64set = 'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/'; // Our array index into the input string // and the number of characters remaining var i, rem: integer; // The three characters of the input // we're currently processing i1, i2, i3 : byte; // The four array indexes // for our output characters n1, n2, n3, n4 : byte; begin rem := Length(inputstring); i := 0; while rem > 0 do begin i1 := 0; i2 := 0; i3 := 0; if rem > 0 then begin i1 := inputstring[i]; inc(i); end; if rem > 1 then begin i2 := inputstring[i]; inc(i); end; if rem > 2 then begin i3 := inputstring[i]; inc(i); end;

So that's the 8-bit bytes gathered. Now we need to split these into groups of 6. The way I decided to do this was to make one long number by left shifting the byte values and then right shifting each group of 6 into the lowest 6-bit position to get the new byte values.

// Form a long number from which // to take our groups of six outputNo := (i1 shl 16) + (i2 shl 8) + i3; // Shift the six left most bits // over to the right by 18 places, // so the occupy the six least-significant bits n1 := outputNo shr 18; n2 := (outputNo shr 12) and 63; n3 := (outputNo shr 6) and 63; n4 := (outputNo) and 63;

The fact our variables n1 - n4 are bytes means that the higher order bits we don't need are automatically truncated. All we do then is ensure that bits 7 and 8 are also both 0 by anding with 63 (binary 00111111) and we have our array indexes.
All that remains is to get the correct output characters, add any padding based on the number of remaining input characters and close off our loop.

if rem < 2 then result := result + Base64Set[n1 + 1] + Base64Set[n2 + 1] + '==' else if rem < 3 then result := result + Base64Set[n1 + 1] + Base64Set[n2 + 1] + Base64Set[n3 + 1] + '=' else result := result + Base64Set[n1 + 1] + Base64Set[n2 + 1] + Base64Set[n3 + 1] + Base64Set[n4 + 1]; rem := length(inputstring) - i; end; end;

Et voilá!

Posted on 2023-01-15

Next: Barkout (or Yet Another Arkanoid Clone) »

« Previous: Cybersource REST API with Microforms

Comments