Base64 is a binary to a text encoding scheme that represents binary data in an American Standard Code for Information Interchange (ASCII) string format. It’s designed to carry data stored in binary format across the channels, and it takes any form of data and transforms it into a long string of plain text.
This is helpful because every hour of every day, every minute of every hour, and every second of every minute, a tremendous amount of data is being transferred over network channels.
What Is Base64 Encoding?
To better understand base64, we first need to understand data and channels.
As we know, communicating data from one location to another requires some sort of pathway or medium. These pathways or mediums are called communication channels.
What type of data is transferred along these communication channels?
Today, we can transfer any format of data across the globe, and that data can be in the form of text, binary large object file (BLOB) and character large object file (CLOB) format. When that data transfers through a communication medium, it’s chopped into chunks called packets, and each packet contains data in binary format(0101000101001). Each packet then moves through the network in a series of hops.
Before diving into the base64 algorithm, let’s talk about BLOB and CLOB.
An Introduction to BLOB and CLOB
BLOB stands for binary large object file. Whenever we transfer image, audio or video data over the network, that data is classified as BLOB data. CLOB, which stands for character large object file, includes any kind of text XML or character data transferred over the network. Now, let’s dive into base64 encoding.
As we stated earlier, base64 is a binary to text encoding scheme that represents data in ACII string format and then carries that data across channels. If your data is made up of 2⁸ bit bytes and your network uses 2⁷ bit bytes, then you won’t be able to transfer those data files. This is where base64 encoding comes in. But, what does base64 mean?
First, let’s discuss the meaning of base64.
base64 = base+64
Base64 is considered a radix-64 representation. It uses only 6 bits(2⁶ = 64
characters) to ensure that humans can read the printable data. But, the question is, why? Since we can also write base65 or base78 encoding, why do we only use 64?
Base64 encoding contains 64 characters to encode any string.
It contains:
- 10 numeric values i.e., 0,1,2,3,…..9.
- 26 Uppercase alphabets i.e., A,B,C,D,…….Z.
- 26 Lowercase alphabets i.e., a,b,c,d,……..z.
- Two special characters i.e., +,/, depending on your OS.
How Base64 Works
Let’s start with an example encoding a “THS” string into a base64 format.
In the above example, we’re encoding the string, “THS,” into a base64 format using the base64 operator(<<<
) on a Linux CLI. We can see that we’re getting an encoded output: VEhTCg==
. Now, let’s dive into the step-by-step procedure of getting this encoded output.
The steps followed by the base64 algorithm include:
- Count the number of characters in a string.
- If it’s not a multiple of three, pad with a special character, i.e., “=” to make it a multiple of three.
- Encode the string in ASCII format.
- Now, it will convert the ASCII to binary format, 8 bit each.
- After converting to binary format, it will divide binary data into chunks of 6 bits each.
- The chunks of 6-bit binary data will now be converted to decimal number format.
- Using the base64 index table, the decimals will be again converted to a string according to the table format.
- Finally, we will get the encoded version of our input string.
At the beginning of base64, we started with the string “THS.” Now, we’re going to encode this string by following the above algorithm steps.
Base64 Encoding Steps
Let’s begin.
The number of characters in THS is three. Now, let’s encode this string to ASCII format.
Next, we’ll convert ASCII into an 8-bit binary number.
Then, we’ll divide the above binary into chunks of the 6-bit block.
From there, we have to use the base64 index table to get the exact value of the decimal numbers.
Finally, we receive the encoded output of base64 as VEhT
.
But wait. In the above example, why did we get VEhTCg==
?
This is because after writing base64 <<<
THS command on the command line interface (CLI), we pressed enter. Then a new line is added to an encoded output.
If we run the echo -n THS | base64
command on CLI, we’ll see that we’re getting the same output as before.
What If Our Input String Isn’t a Multiple of 3?
What do we do if our string is not a multiple of three? According to the first step of the algorithm, base64 will count the number of characters in a given input string. If it’s not multiple of three, then it will pad it with a single “=” character.
Let’s look at one more example to prove it.
Suppose, we want to encode “abraAbra.”
In this given string, the number of characters is not multiple of three. So, we have to pad one (=) at the end. But why?
The “=” is padding when the least significant bit of binary data doesn’t contain 6-bit.
Let’s convert ASCII numbers to 8-bit binary first.
Now, divide the 8-bit blocks into chunks of 6-bit blocks and add padding.
After converting binary to decimal, let’s use the base64 index table to encode our final output. Again, use the following link to know the index values according to the base64 format.
Finally, we’ll return the encoded output, which will be: YWjyYUFicmE=
You can also try this in your CLI.
In this above output, we return K as a special character in place of “=” because this depends on your system. You can try this new command on your CLI for confirmation i.e., echo | base64
.
And that’s base64 encoding. I hope this explanation makes sense to you.