After writing a considerable number of SQL scripts, you’re likely to reach a performance plateau. You extract insights using the same strategies and run into the same types of errors.
Fortunately, you can improve your experience writing queries by understanding how the clauses in SQL are evaluated.
7 Stages of SQL Order of Execution
- FROM/JOIN
- WHERE
- GROUP BY
- HAVING
- SELECT
- ORDER BY
- LIMIT/OFFSET
Here, we discuss the order of execution in SQL and why it matters.
What Is SQL’s Order of Execution?
SQL queries adhere to a specific order when evaluating clauses, similar to how mathematical operations adhere to PEMDAS or BODMAS.
From the eyes of the user, queries begin from the first clause and end at the last clause. However, queries aren’t read from top to bottom when carried out.
The order in which the clauses in queries are executed is as follows.
1. FROM/JOIN
The FROM and/or JOIN clauses are executed first to determine the data of interest.
2. WHERE
The WHERE clause is executed to filter out records that do not meet the constraints.
3. GROUP BY
The GROUP BY clause is executed to group the data based on the values in one or more columns.
4. HAVING
The HAVING clause is executed to remove the created grouped records that don’t meet the constraints.
5. SELECT
The SELECT clause is executed to derive all desired columns and expressions.
6. ORDER BY
The ORDER BY clause is executed to sort the derived values in ascending or descending order.
7. LIMIT/OFFSET
Finally, the LIMIT and/or OFFSET clauses are executed to keep or skip a specified number of rows.
SQL Order of Execution Example
To illustrate the order of execution in SQL, it’s best to use an example. We have two tables named “Orders” and “Products” that detail purchases made on stationery items.
Using these tables, we want to find out who spent the second most amount of money on pens.
We can obtain the answer with the following query:
-- Find the person that spent the second most amount on pens
SELECT O.name,
SUM(O.quantity*P.price) AS total_spent
FROM Orders O
JOIN Products P
ON O.product_id = P.id
WHERE P.name = 'Pen'
GROUP BY O.name
ORDER BY total_spent DESC
LIMIT 1 OFFSET 1;
The answer is successfully outputted, but how is it derived in the first place?
As previously stated, a user might read the query starting from the SELECT clause and ending with the OFFSET clause, but SQL doesn’t read clauses from top to bottom. Instead, it derives the answer with the following steps:
- The FROM and JOIN clauses merge the “Products” and “Orders” tables to obtain the total data of interest.
- The WHERE clause removes records where pens aren’t purchased.
- The GROUP BY clause groups together records by name.
- The HAVING clause removes groups that spend more than 2,000 on pens.
- The SELECT clause derives the buyers and their corresponding amount spent on pens.
- The ORDER BY clause sorts the results based on the total amount spent on pens in descending order.
- The LIMIT and OFFSET clauses skip the first row and keep only the next record, which contains the person that spent the second most on pens.
Why SQL Order of Execution Matters
The order of execution in SQL might seem unimportant at first glance. After all, if queries are delivering the desired outputs, who cares how the clauses are evaluated?
However, users unfamiliar with the order of execution will struggle with writing more complex queries because any issues that emerge will be harder to debug. Those looking to troubleshoot errors with greater ease will benefit from understanding the order in which the clauses in SQL are read.
For example, a common mistake in SQL is incorrectly referencing column aliases.
We can highlight this mistake by using the “Orders” and “Products” tables to answer another question: Out of those who have spent less than $50, who has spent the most amount of money on an order?
We can calculate the total money spent on an order by multiplying the price variable in the “Products” table by the quantity variable in the “Orders” table and naming this expression with the alias total_spent
.
Let’s see what answer the following query outputs:
-- purchases with more total spent lower than 50
SELECT O.name,
(P.price * O.quantity) AS total_spent
FROM Orders O
JOIN Products P
ON O.product_id = P.id
WHERE total_spent < 50
ORDER BY total_spent DESC
This throws an error. What’s going on?
A user unfamiliar with the order of execution might not see the issue with using the column alias “total_spent”
in the WHERE clause, so the query might seem valid at face value.
However, if you know the order of execution in SQL, it’s clear that since the column alias is created in the SELECT clause, which is evaluated after the WHERE clause, SQL will not contain that alias when evaluating the WHERE clause. This is what creates the error.
Now that we can identify the source of the error, we can fix it by not using the column alias in the WHERE clause’s expression.
-- purchases with more total spent lower than 50
SELECT O.name,
(P.price * O.quantity) AS total_spent
FROM Orders O
JOIN Products P
ON O.product_id = P.id
-- no using column aliases in the WHERE clause
WHERE P.price * O.quantity < 50
ORDER BY total_spent DESC
The query now runs successfully.
Note that the ORDER BY clause still uses the column alias total_spent, which is fine since this clause is evaluated after the SELECT clause.
In general, issues like these are easy to address if you understand the order of execution in SQL. However, those oblivious to it will have a hard time fixing errors that may seem benign at face value.
All in all, understanding how queries are read will benefit users looking to improve their scripts and have a more pain-free experience troubleshooting any errors.
It might seem daunting to remember this order at first, but the order of execution will be easy to learn by heart after just a little experience, just like PEMDAS or BODMAS