Base64 Encoding: What Is It? How Does It Work?

Base64 is a binary to a text encoding scheme that represents binary data in an ASCII string format — essential for carrying data stored in binary across channels.

Written by Akshay Kumar
Published on Dec. 16, 2022
Base64 Encoding: What Is It? How Does It Work?
Image: Shutterstock / Built In
Brand Studio Logo

Base64 is a binary to a text encoding scheme that represents binary data in an American Standard Code for Information Interchange (ASCII) string format. It’s designed to carry data stored in binary format across the channels, and it takes any form of data and transforms it into a long string of plain text.

This is helpful because every hour of every day, every minute of every hour, and every second of every minute, a tremendous amount of data is being transferred over network channels.

What Is Base64 Encoding?

Base64 encoding is a text encoding string that represents binary data into an ASCII string, allowing it to carry data stored in a binary format across channels.

To better understand base64, we first need to understand data and channels.

As we know, communicating data from one location to another requires some sort of pathway or medium. These pathways or mediums are called communication channels.

What type of data is transferred along these communication channels?

Today, we can transfer any format of data across the globe, and that data can be in the form of text, binary large object file (BLOB) and character large object file (CLOB) format. When that data transfers through a communication medium, it’s chopped into chunks called packets, and each packet contains data in binary format(0101000101001). Each packet then moves through the network in a series of hops.

Before diving into the base64 algorithm, let’s talk about BLOB and CLOB.

More on Software Engineering: Glob Module in Python: Explained

 

An Introduction to BLOB and CLOB 

BLOB stands for binary large object file. Whenever we transfer image, audio or video data over the network, that data is classified as BLOB data. CLOB, which stands for character large object file, includes any kind of text XML or character data transferred over the network. Now, let’s dive into base64 encoding.

As we stated earlier, base64 is a binary to text encoding scheme that represents data in ACII string format and then carries that data across channels. If your data is made up of 2⁸ bit bytes and your network uses 2⁷ bit bytes, then you won’t be able to transfer those data files. This is where base64 encoding comes in. But, what does base64 mean?

First, let’s discuss the meaning of base64.

base64 = base+64

Base64 is considered a radix-64 representation. It uses only 6 bits(2⁶ = 64 characters) to ensure that humans can read the printable data. But, the question is, why? Since we can also write base65 or base78 encoding, why do we only use 64? 

Base64 encoding contains 64 characters to encode any string.

It contains:

  • 10 numeric values i.e., 0,1,2,3,…..9.
  • 26 Uppercase alphabets i.e., A,B,C,D,…….Z.
  • 26 Lowercase alphabets i.e., a,b,c,d,……..z.
  • Two special characters i.e., +,/, depending on your OS.

More on Software Engineering: Nlogn and Other Big O Notations Explained

 

How Base64 Works

Let’s start with an example encoding a “THS” string into a base64 format.

Encoding THS string into base64.
Encoding THS string into base64. | Screenshot: Akshay Kumar

In the above example, we’re encoding the string, “THS,” into a base64 format using the base64 operator(<<<) on a Linux CLI. We can see that we’re getting an encoded output: VEhTCg==. Now, let’s dive into the step-by-step procedure of getting this encoded output.

The steps followed by the base64 algorithm include:

  • Count the number of characters in a string.
  • If it’s not a multiple of three, pad with a special character, i.e., “=” to make it a multiple of three. 
  • Encode the string in ASCII format.
  • Now, it will convert the ASCII to binary format, 8 bit each.
  • After converting to binary format, it will divide binary data into chunks of 6 bits each.
  • The chunks of 6-bit binary data will now be converted to decimal number format.
  • Using the base64 index table, the decimals will be again converted to a string according to the table format.
  • Finally, we will get the encoded version of our input string.

At the beginning of base64, we started with the string “THS.” Now, we’re going to encode this string by following the above algorithm steps.

A video on the basics of base64 encoding. | Video: Connor Ashcroft

 

Base64 Encoding Steps

Let’s begin. 

The number of characters in THS is three. Now, let’s encode this string to ASCII format.

“THS” in ASCII format.
“THS” in ASCII format. | Screenshot: Akshay Kumar

Next, we’ll convert ASCII into an 8-bit binary number.

After converting ASCII numbers to binary (0,1) format.
After converting ASCII numbers to binary (0,1) format. | Akshay Kumar

Then, we’ll divide the above binary into chunks of the 6-bit block.

Dividing it into 6-bit chunks.
Dividing it into 6-bit chunks. | Screenshot: Akshay Kumar

From there, we have to use the base64 index table to get the exact value of the decimal numbers. 

Final base64 encoded value.
Final base64 encoded value. | Screenshot: Akshay Kumar

Finally, we receive the encoded output of base64 as VEhT.

But wait. In the above example, why did we get VEhTCg==?

This is because after writing base64 <<< THS command on the command line interface (CLI), we pressed enter. Then a new line is added to an encoded output. 

Running base64 <<<​​​​​​ THS command on the CLI.
Running base64 <<<​​​​​​ THS command on the CLI. | Screenshot: Akshay Kumar

If we run the echo -n THS | base64 command on CLI, we’ll see that we’re getting the same output as before.

More on Data: A Beginner’s Guide to Language Models

 

What If Our Input String Isn’t a Multiple of 3?

What do we do if our string is not a multiple of three? According to the first step of the algorithm, base64 will count the number of characters in a given input string. If it’s not multiple of three, then it will pad it with a single “=” character.

Let’s look at one more example to prove it.

Suppose, we want to encode “abraAbra.”

Encoding abraAbra into binary data.
Encoding abraAbra into binary data. | Screenshot: Akshay Kumar

In this given string, the number of characters is not multiple of three. So, we have to pad one (=) at the end. But why?

The “=” is padding when the least significant bit of binary data doesn’t contain 6-bit. 

Let’s convert ASCII numbers to 8-bit binary first.

Dividing the data into 8-bit blocks.
Dividing the data into 8-bit blocks. | Screenshot: Akshay Kumar

Now, divide the 8-bit blocks into chunks of 6-bit blocks and add padding.

Returning the encoded output.
Returning the encoded output. | Screenshot: Akshay Kumar

After converting binary to decimal, let’s use the base64 index table to encode our final output. Again, use the following link to know the index values according to the base64 format.

Returning the encoded output.
Returning the encoded output. | Screenshot: Akshay Kumar

Finally, we’ll return the encoded output, which will be: YWjyYUFicmE=

You can also try this in your CLI.

Trying base64 on your CLI.
Trying base64 on your CLI. | Screenshot: Akshay Kumar

In this above output, we return K as a special character in place of “=” because this depends on your system. You can try this new command on your CLI for confirmation i.e., echo | base64.

And that’s base64 encoding. I hope this explanation makes sense to you.

Hiring Now
Zone & Co
Fintech • Professional Services • Software • Consulting
SHARE