Database Normalization vs Denormalization

What is database normalization? How about denormalization? Do you understand the differences between the two? These are all concepts that anyone who administers a database should understand. Database normalization impacts how data is stored and accessed, and having a good grasp of these principles can protect your users from creating errors in your database. In this article, we’ll delve into each topic so you can manage your databases as efficiently as possible.

Database Normalization vs. Denormalization: What’s the Difference?

Database normalization is a method of organizing data inside of a relational database based on normal form. This principle is one of the foundations of relational databases, which connect information from different tables via a key, or a unique identifier.
Database denormalization is the condition where all the data lives in one table instead of separate ones.

More From Adam ThomasSprint Scrum Cycle in Agile Development

What Is Database Normalization?

Database normalization is a method of organizing data inside of a relational database based on normal form. Computer scientist Edgar F. Codd proposed normal form as a way to reduce anomalies that can happen when injecting data into a database. This principle is one of the foundations of relational databases, which connect information from different tables via a key, or a unique identifier.

By using normal form as a method of organization, you can reduce the possibility of creating anomalies in the database. The three anomalies — insertion, update, and deletion — can ruin the integrity of a database.

Types of Database Anomalies

Insertion anomaly occurs when data is inserted into a table data that doesn’t correspond to the structure of the table. For example, if a table had columns for name and address, and somehow, someone added a name, address, and phone number, the phone number would be the insertion anomaly.
Update anomaly occurs when data isn’t updated in all areas. For example, when you update something, say an address, in one place but not others.
Deletion anomaly occurs when data is deleted in a way that means more data has to get deleted. For example, you might want to delete one item but then have to delete everything since it’s all in one row.

You can achieve this organizational pattern by breaking the data into atomic elements, which are elements that cannot be broken down into smaller parts. Inside a database, this can mean repeating lines to ensure that one row in the database focuses on one object.

Here is an example. This is a denormalized set of data.

A denormalized data set — Image: Screenshot

This is this same data broken down into atomic elements

A normalized data set — Image: Screenshot

As you can see, the data is organized to create one row per class. We’re on the path to normalization because the table’s atomic elements represent first normal form (1NF). 1NF notes that each box in the table only represents one piece of information.

Next, we’ll move this into the second normal form (2NF). 2NF involves separating the information into a table with a unique key. For this purpose, we will use the assistant principals since they represent multiple students.

A data set in normal form — Image: Screenshot

We then pull that information out of this table since it will be referenced.

The next step is to fully normalize the table in third normal form (3NF). 3NF decomposes any transitive functional dependencies, that is, removing anything that can be referenced into its own table. This creates tables like so.

What Are the Pros and Cons of Normalization?

Database normalization helps reduce data redundancy by putting data in unique tables to be referenced. Reducing the redundancy also creates a more compact data set. Compactness helps clarity for reading purposes and makes them easier to edit because it’s easier to see the classes above in a separate table instead of one large one.

A normalized table also minimizes the chance of getting a null variable, since separate tables make creating queries easier to debug. For example, the size of the class is much easier to query with separate tables instead of one mass of data.

A normalized table makes it easier to add data to your tables since you’re only adding one form of it. For example, if you want to add a class, you can just add a class. You don’t have to worry about the other data, avoiding the chance for an insertion error. Further, since the data is separated, you can extend without impacting it.

These benefits come at the cost of speed since multiple tables have to be read instead of one. This issue isn’t a big deal with small databases. In larger ones, however, especially those whose queries are tied to service level agreements (SLAs), this can create serious problems.

What Is Database Denormalization?

Database denormalization is the condition where all the data lives in one table instead of separate ones. Essentially, you take the tables we created above and join them back into the original state like so:

Consider this similar to code minimization, where the code is harder to read or edit but runs faster since everything is crunched together.

What Are the Pros and Cons of Denormalization?

Denormalization also has benefits. First, accessing the data is much quicker since a query doesn’t need to search in multiple tables and join information. Generating reports is also much faster.

The table is easier to manage if you aren’t adding data since all the information is functionally in one place. Likewise, your data is prone to fewer errors since everything is in one place.

On the other hand, a denormalized database is subject to far more risk of anomalies since the data isn’t separated into different tables like it is in a normalized database.

More on NormalizationWhat Is Database Normalization?

Differences Between the Two

Normalized databases are focused on reducing redundant information. They make better use of space since the data is organized. They also offer high integrity by minimizing the chance of the anomalies thanks to separating the tables. Finally, normalization is useful for databases where change happens often, such as social media profiles.

Denormalized databases primarily offer data retrieval speed and faster query execution. They’re also cheaper to maintain. Denormalization is useful for information that is often queried but rarely updated, such as tax information.