SQL Tuning Techniques
This article introduces some fundamental strategies to improve the speed and efficiency of SQL queries.
Utilizing Indexes
Indexing is a crucial concept in SQL as it significantly improves the performance of database queries.
Improving SQL Performance by Creating Indexes
Creating an index on the columns used in WHERE
clause can change the full table scan to an index scan, thereby improving query speed. However, it's worth noting that indexing tends to be beneficial when filtering a relatively small subset of data.
-- Before optimization
SELECT * FROM books WHERE author = "Hemingway";
-- After optimization
CREATE INDEX idx_books_author ON books(author);
SELECT * FROM books WHERE author = "Hemingway";
Creating Compound Indexes
A compound index includes multiple columns. It is particularly effective when you filter on multiple columns in your WHERE
clause. But remember, the order of columns in the index and in the WHERE
clause matters.
-- Before optimization
SELECT * FROM students WHERE firstName = "John" AND lastName = "Doe";
-- After optimization
CREATE INDEX idx_students_first_last_name ON students(firstName, lastName);
SELECT * FROM students WHERE firstName = "John" AND lastName = "Doe";
Creating Indexes for Each Column instead of Compound Indexes
In certain cases, you might want to create separate indexes for each column rather than a single compound index. This can be beneficial when you often filter on individual columns.
-- Before optimization
SELECT * FROM orders WHERE customerID = 123 OR productID = 456;
-- After optimization
CREATE INDEX idx_orders_customerID ON orders(customerID);
CREATE INDEX idx_orders_productID ON orders(productID);
SELECT * FROM orders WHERE customerID = 123 OR productID = 456;
Creating Indexes for Columns used in ORDER BY and GROUP BY
Columns used in ORDER BY
and GROUP BY
clauses can benefit from indexing, which can speed up these operations. If there are multiple columns involved, a compound index might be created.
-- Before optimization
SELECT category, COUNT(*) FROM products GROUP BY category;
-- After optimization
CREATE INDEX idx_products_category ON products(category);
SELECT category, COUNT(*) FROM products GROUP BY category;
-- Before optimization
SELECT * FROM customers ORDER BY lastName, firstName;
-- After optimization
CREATE INDEX idx_customers_last_first_name ON customers(lastName, firstName);
SELECT * FROM customers ORDER BY lastName, firstName;
Avoiding Meaningless Sorting
Database query performance can be significantly affected by the inclusion of unnecessary sorting operations. These operations, including ORDER BY
and GROUP BY
commands, can cause processing time to balloon, affecting overall performance.
Avoiding Unnecessary ORDER BY and GROUP BY
ORDER BY
and GROUP BY
operations consume a significant amount of processing time. As such, they should only be used when absolutely necessary. Moreover, when they are employed, they should be applied after the dataset has been as much filtered as possible.
-- Before optimization
SELECT * FROM users ORDER BY name;
-- After optimization
SELECT * FROM users WHERE active = 1 ORDER BY name;
In the optimized version, the ORDER BY
clause is applied after filtering inactive users, which reduces the number of records to be sorted.
Speeding up MAX, MIN Calculation with Indexes
Calculations such as MIN
and MAX
can be optimized using indexes. When an index is applied to the column involved in these operations, the database can quickly find the needed values.
-- Before optimization
SELECT MAX(score), MIN(score) FROM students;
-- After optimization
CREATE INDEX idx_students_score ON students(score);
SELECT MAX(score), MIN(score) FROM students;
Using EXISTS instead of DISTINCT
The DISTINCT
clause, used to eliminate duplicate rows, is a costly operation as it requires sorting data. In scenarios where only unique values are required without the need for sorting, EXISTS
can be used as an alternative.
-- Before optimization
SELECT DISTINCT c.course_id FROM courses AS c INNER JOIN enrollments AS e ON c.course_id = e.course_id;
-- After optimization
SELECT c.course_id FROM courses AS c WHERE EXISTS (SELECT 1 FROM enrollments AS e WHERE e.course_id = c.course_id);
Using UNION ALL instead of UNION
UNION
operations can be resource-intensive as they involve sorting and eliminating duplicate records. If you are joining rows that you know are unique, UNION ALL
can be used instead, which does not involve sorting or eliminating duplicates.
-- Before optimization
SELECT name FROM employees UNION SELECT name FROM managers;
-- After optimization
SELECT name FROM employees UNION ALL SELECT name FROM managers;
Avoiding Repetitive Processing
In SQL tuning, the key principle to embrace is to minimize repetitive processing as much as possible. This can be accomplished by reducing the number of table accesses, preferring JOIN
s over subqueries within SELECT
, and avoiding unnecessary repeated filtering.
Reducing the Number of Table Accesses
The fundamental part of SQL tuning involves decreasing the frequency of table accesses. In situations where table content can be verified in a single instance, it's optimal to complete the check within that single occasion.
Use JOIN Instead of Subqueries within SELECT
When we use subqueries in the target column of a SELECT
statement, the SQL for the subquery is executed for each row, which can increase processing time. Instead, using a JOIN
for a single association can produce the same results with less processing.
Consider the following code, where we are executing a subquery within the SELECT
statement for each employee row to get the name of the department.
SELECT
emp.*, (SELECT dp.name FROM departments AS dp WHERE dp.id = emp.department_id)
FROM employees AS emp;
By using a LEFT JOIN
, we can achieve the same result without having to execute the subquery multiple times, as shown in the modified code below.
SELECT
emp.*, dp.name
FROM employees AS emp LEFT JOIN departments AS dp
ON emp.department_id = dp.id;
Avoid Unnecessary Repeated Filtering
The WHERE
clause can also contribute to processing time. Reducing the number of times it's used can help eliminate unnecessary processing.
Let's take a look at the following example where we are filtering twice with the WHERE
clause and associating the data.
SELECT o1.*, o2.daily_summary
FROM orders AS o1
LEFT JOIN(
SELECT order_date, SUM(order_amount * order_price) AS "daily_summary" FROM orders
WHERE order_date BETWEEN "2023-01-01" AND "2023-01-31"
GROUP BY order_date
) AS o2
ON o1.order_date = o2.order_date
WHERE o1.order_date BETWEEN "2023-01-01" AND "2023-01-31";
By using window functions, we can reduce it to a single filter and a single aggregation as shown in the modified code below.
SELECT
*, SUM(o1.order_amount * o1.order_price) OVER(PARTITION BY o1.order_date) AS daily_summary
FROM orders AS o1
WHERE o1.order_date BETWEEN "2023-06-01" AND "2023-06-30";
Reducing the Number of SQL Executions
When optimizing SQL performance, reducing the number of SQL executions can contribute significantly towards making your queries more efficient. This is due to the fact that each execution of SQL causes communication between the application server and the DB server, which can be time-consuming. Let's explore some techniques to reduce the number of SQL executions.
Reducing the Number of INSERT Commands
When inserting multiple records into a table, consider using multi-insert. This allows you to insert multiple records in a single SQL execution.
For example, if you want to insert two records into a projects
table, you might typically do something like this:
INSERT INTO projects(name, start_date, end_date) VALUES ('Database','2023-02-01','2023-08-31');
INSERT INTO projects(name, start_date, end_date) VALUES ('WebApp','2023-04-05','2023-12-23');
However, these two separate insertions can be combined into one operation as follows:
INSERT INTO
projects(name, start_date, end_date)
VALUES
('Database','2023-02-01','2023-08-31'), ('WebApp','2023-04-05','2023-12-23');
Removing Index Before Bulk INSERT and Adding It Afterwards
When inserting records into a table that has an index, the system must also update the index, which can be a time-consuming operation. When inserting tens of thousands of records or more, it's worth considering removing the index before the insert operation, and then re-creating it afterwards.
Using LIMIT for Query Execution
If you just want to check a few records in a table, you can use the LIMIT
clause to reduce processing time. However, note that aggregate functions such as COUNT(*)
will not be sped up by adding a LIMIT
clause.
Using TRUNCATE Instead of DELETE for Total Record Removal
When you need to remove all records from a table, the DELETE
operation can be slow because it logs the deletion of each individual row. If you do not need to keep these logs and need to clear a table completely, consider using the TRUNCATE
command, which is typically much faster.