Hello guys, its no secret that SQL is one of the must have skill in this data driven world and its also one of those skills which is equally useful for programmers, developers, database admins, data scientists, data analyst, business analyst, quality analyst and even project managers. I first learned SQL 20 years back when I was in college and that time we are using Oracle database. I still remember my first query about "SELECT * FROM EMP", EMP was one of the tables which was quite popular among learners and most of the Oracle DB instance on labs have that.
Since then I have come a long way in my SQL journey and in last 20 years I have used MySQL, SQL Server, and PostgreSQL along with Oracle. In this 20 years, I learned a lot about databases, tables, indexes, queries, transactions, replications, and stored procedure.
I have shared many of those lessons on my articles in this blog but many of you are asking for SQL Query best practices for a long time, so here we are.
Earlier, I have shared tips to write complex SQL queries and 5 tips for running SQL queries in live production databases and in this article, I am going to share 17 SQL best practices developers can follow to write better and faster SQL queries. If you are working in SQL for few years, you may know most of them but if find anything new and interesting, feel free to let me know in comments.
17 SQL Query Best Practices for Programmers and Developers
This list contains best practices which will help you to write correct and robust SQL queries. Some best practices will also help you to improver performance of your existing SQL queries.1) Format your SQL queries
Nothing is worse than reading a complex SQL query in one line without any formatting. By carefully writing and formatting your SQL query you can improve its readability, which will help others to understand and modify your query. If you are not using any style or formatter, consider using this instant SQL formatter (https://www.dpriver.com/pp/sqlformat.htm).
2) Don't use SELECT * in your queries.
This is one of the oldest SQL best practices, often suggested by senior developers. Benefit is lies on common sense, * will select all columns which may require more disk I/O and can take more bandwidth when server sends the data to client.
If you just need Id or Name then just mentioning them in your SQL query will improve performance. If those columns are part of index than Query optimizer will directly load from there.
Bad
SELECT * FROM Employee;
Good
SELECT emp_id, emp_name FROM Employee;
Here are few more reasons of not using SELECT * in your queries for further learning.
3) Always use a column list in your INSERT statements.
It's a safety first best practices. When you just use INSERT statements without column list than it uses the column order defined in the database. This mean, your SQL query may work in one environment e.g. UAT but fail in other e.g. real environment because column order may be different in those two databases.
Bad
INSERT INTO Employee values (1, "John", 31, 5000, 2); //rely on column order of table
Good
INSERT INTO Employee(id, name, age, salary, dept_id)
values (1, "John", 31, 5000, 2); // doesn't rely on column order of table
4) Always use table aliases when your SQL statement involves more than one source.
This makes your query more readable and also avoid ambiguity if two tables contains any column with same name. This one is also my favorite SQL Server tips as when you use alias the SQL Server management studio will help you with SQL query completion or code completion.
Bad
SELECT id, name, age, salary, Depatment.id, Department.name
FROM Employee JOIN Department
ON Employee.dept_id=Department.id
WHERE Employee.salary> 6000
Good
SELECT e.id, e.name, e.age, e.salary, d.id, d.name
FROM Employee e JOIN Department d ON e.dept_id=d.id
WHERE e.salary > 5000;
5) Use the more readable ANSI-Standard JOIN clauses instead of the old style joins.
Old style of joins are confusing and risky as they increase the likelihood of cross joins. You should always mention JOIN keyword if you are joining tables in SQL.
Bad
SELECT e.id, e.name, e.age, d.id, p.name
FROM Employee e, Department d, Project p
WHERE e.dept_id = d.id and e.project_id = p.id;
Good
SELECT e.id, e.name, e.age, d.id, p.name
FROM Employee e
INNER JOIN Department d ON e.dept_id = d.id
INNER JOIN Projects p ON e.project_id = p.id
6) Do not use column numbers in the ORDER BY clause.
Using column name instead of numbers make your SQL query more readable, which is worth gold and that's why I highly recommend this SQL best practice to anyone who write SQL queries.
Bad
SELECT id, name, age FROM Employee ORDER BY 2;
Good
SELECT id, name, age FROM Employee ORDER BY name;
7) Pay attention to order of columns in case of composite index
Choosing the right order for columns in a composite index (index with multiple columns) can increase the applicability of index. This is also a common index based interview question and knowing how to write query so that right index is selected is a skill.
Bad
CREATE INDEX IdxDepartment ON (dept_id, id);
//this index will not be used for queries where only emp_id is specified
Good
CREATE INDEX IdxDepartment ON (id, dept_id);
// this will be used even if just emp_id is specified
8) Avoid co-related subquery if you can
If you ask me, Correlated subqueries are slow because it executed by every row returned by OUTER query, avoiding this can improve query performance. You should avoid co-related subquery as much as possible and use alternative ways like temporary tables or other techniques to achieve same result.
9) Don't create too many Indexes
Too many indices are not good because they slow down INSERT and UPDATE operation. Strike a balance between good number of indices for frequently used search criterion and avoid going to too many region to keep the INERT and UPDATE fast.
10) Prefer Set based operation over Cursor
Cursors are iterative operation and Database are not designed for that. Any iterative approach will be slower than equivalent SET based approach. Remember SET doesn't have any order.
11) Understand data and table structure
You cannot write efficient SQL queries without understanding your data, schema and indexes. Having a good knowledge of data will ensure you will use right columns in WHERE clause to filter unnecessary data.
12) Update statistics
Regularly updating statistics for your table improves the Query optimizer performance because it can choose the right index based upon selectivity. I have noticed this multiple time while working on my last project where we have many batch job which insert data into database.
After those job finishes many times our queries become slower. In order to solve that problem, we added update statistics at the end of those jobs to update the statistics and this significantly improved performance of our application and SQL queries.
13) Create Selective Index
A highly selective index is one which filters 99% of rows e.g. a primary key, which filters all the rows except one you want. Whenever you create index, check how much selectivity it provides. Index should provide at least 95% of selectivity.
14) Do not prefix your stored procedure names with "sp_abc"
sp_ is prefix reserved for system stored procedure, instead you can use "usp_" for user stored procedure. This is also a standard prefix for application specific stored procedures and distinguish from system specific stored procedures.
15) Drop unused index
This best practice is sub-part of the best practice we discuss earlier about avoid creating too many index. If you already have more than 2 index, consider dropping index which is not used. This will improve performance of your insert and update queries.
16) Always define a primary key
A table without primary key is not very useful. First Primary key creates clustered index which specify how to store data in disk, second primary key enforce NOT NULL and UNIQUE constraint which improves data quality and makes it easy to select the data.
17) Use integrity constraints like NOT NULL or CHECK to improve the data quality.
While inserting or updating data, you can put constraints on columns as NOT NULL or CHECK to do some validation before data is inserted into tables. This can help you to maintain data quality.
Also, here is a nice slide to remember the most important SQL query practices:
That's all in this list of SQL query best practices for every developers. In this list I have shared best practice related to table structure, writing SQL query, performance improvement, indexes as well as bad practice which you can avoid while working with tables and database.
While I have created this list of SQL best practice for developers because I am a developer and these are things I have learned from my own experience, they are equally useful for any Junior Database administrator or anyone who want to become DBA.
I expert senior and experience DBAs to know all these but they can also read to revise key concepts. I also request them to share few more best practices so that we all can learn from their experiences. Let us know if you follow any other practice which has helped you to write better queries and its not included in this list.
Great article Javin, that's why I am a big fan of you. Thanks
ReplyDelete