Relational databases are made up of tables, where each table contains rows and columns, and each row has a unique ID known as the primary key.
Let’s imagine a data set containing 100 customers with their names, addresses and favorite product. Using a relational model, one table may contain a list of products with four columns to store the primary key, product review, product name and product ID. Finally, another table can have 100 rows for storing all 100 customers and four columns for primary key, customer name, customer address and product ID. We can use the value for product ID to build a relation between these two tables. We can use Structured Query Language (SQL) to further query our tables and get more information about products such as product name or product review.
How Does a Relational Database Work?
Relational databases are based on a relational data model, which allows data practitioners to efficiently query information from multiple tables linked through common attributes.
In order to query information from a relational database, it is common practice to use SQL, the primary point of contact with a relational database. Some vendors create their own implementations of SQL such as MySQL, PostgreSQL and Oracle SQL. These implementations may vary in their syntax or other features from SQL. For example, PostgreSQL offers transaction-safe truncation, which allows users to recover deleted data if anything goes wrong with the transaction surrounding the truncate action; MySQL does not allow this feature.
Why Are Relational Databases Important?
The main feature of a relational database is the capacity to join data from different tables in order to create new and meaningful insights. Let’s say an e-commerce company has a relational database with three tables containing data for products, customers and sales. The product table has a list of products containing columns for product ID and inventory quantity. The customer table has a list of all customers with contact information and customer ID. The sales table tells us which customer ID and what product IDs were sold and at what price. With SQL, it’s simple for a data analyst to query these tables and find useful insights such as top-selling products, top-purchasing customers or which day of the week sees the most sales.
What Are the Advantages of Relational Databases?
Relational databases use primary and foreign keys to interrelate tables. This means we can have a single source of truth for our data. In other words, we can avoid duplicate data and have confidence in the accuracy of our results. Without repeated records, modifying or deleting data is simple since all linked records in other tables will show the change. Let’s return to our e-commerce company: A customer needs to change their email address or their name. If we update the customer table, all other insights — such as the top-purchasing customer or the name of the customer on the sales tables — will automatically reflect the new customer data.
What Are the Disadvantages of Relational Databases?
Cost can be a major drawback because some relational databases can be quite expensive. For example, a Microsoft SQL server enterprise can be many times more expensive than the standard version.
What’s more, because relational databases can join data from many tables with a single query, slow performance can put relational databases at a disadvantage. Performance depends on many different factors such as how many JOIN statements we use, whether or not we index our database properly, or whether we use asterisks to select columns rather than select only the fields we need. There are plenty of tips to improve your relational database’s performance according to the use case. In the end, improving your SQL queries’ performance becomes easier with experience.
Let’s look at an example. Below you’ll find a “slow” query that we’ll make more efficient. Let’s imagine we need three fields from a table for the first 15 days of August 2022:
SELECT * FROM SAMPLE_TABLE WHERE DATE_TIMESTAMP > "2022-08-01" ORDER BY DATE_TIMESTAMP ASC
SELECT FIELD_01, FIELD_02, DATE_TIMESTAMP FROM SAMPLE_TABLE WHERE DATE_TIMESTAMP BETWEEN "2022-08-01" AND '2022-08-15' ORDER BY DATE_TIMESTAMP ASC
The original query uses
SELECT *, which selects all the fields in the table while the improved query selects three specific fields from the table. Also, the original query has a
WHERE clause that delimits the date from the first of August 2022, but does not delimit the search until August 15. This means that the original query will take longer to run as we add more data to the table.
What Are the Alternatives to Relational Databases?
There are more alternatives to store our data that offer certain advantages depending on the use case. For unstructured data, a non-relational (or NoSQL) database can handle large volumes of data at high speeds. Another alternative to relational databases include a data lake (if the correct use case presents itself). We can use a data lake to store data at any scale, even if we haven’t yet defined the purpose of the data. In a data lake, data can be structured, unstructured or even in raw files.
If you require relational data and you’ll need to perform data analytics, a data warehouse might be the best option. Data warehouses tend to be cloud-based. Although a data warehouse uses a relational model, it’s meant for use in data analytics and therefore has certain differences from relational databases. For example, in a data warehouse, it’s more common to capture historic data over current data. What’s more, data updates are scheduled rather than created whenever a transaction occurs. Finally, data warehouses are built from collecting and transforming data from multiple data sources for analytics purposes.
Relational Database Examples
Some examples of popular relational databases include Microsoft SQL Server, Oracle Database, MySQL, Amazon Relational Database Service (RDS), PostgreSQL, Azure SQL database and many others. RDS and Azure SQL Database are cloud-based, whereas Microsoft SQL server, MySQL and PostsgreSQL are meant to be on-premises. PostgreSQL and MySQL are open-source whereas Microsoft SQL server is not. Alternatively, cloud environments like Google Cloud Platform, Amazon Web Services or Microsoft Azure are also capable of supporting on-premise databases.