Saturday, August 5, 2017

Difference between Clustered Index and Non Clustered Index in SQL Server - Database Interview Question

In SQL Server database there are mainly two types of  indexes, Clustered index, and Non-Clustered index and the difference between Clustered and Non Clustered index is very important from SQL performance perspective. It is also one of the most common SQL Interview question, similar to the difference between truncate and delete,  primary key or unique key or  correlated vs non-correlated subquery. For those, who are not aware of benefits of Index or why we use an index in the database, they help in making your SELECT query faster. A query with an index is sometimes 100 times faster than a query without an index, of course depending upon how big your table is, but, you must index on columns which are frequently used in WHERE clause of the SELECT query, or which forms a major criterion for searching in the database. For example in Employee database, EmployeeId or  EmployeeName are common conditions to find an Employee in the database. 

As I said, there can be either clustered index or non clustered index in database, former is used to decide how data is physically stored in disk and that's why there can be only one clustered index on any table. 

In this article, we will explore more about both of this indexes and learn some key difference between clustered and non clustered index from interview and performance perspective. 

2 words on Indexes in SQL

Difference between clustered and nonclustered index in SQLContinuing from first paragraph, Index are used to make search faster in SQL. They are mostly maintained as balanced tree (BST), where tree traversal gives you performance in order of log(N). In case of clustered index, data is present in leaf node, so when we run a particular query, which uses clustered index, we can directly find the data by tree traversal. Query optimizer is a component of database, which decides whether to use an index or not to execute a SELECT query, or if use index then which one. You can even see, which index is used for executing your query by looking at query plan, a FULL TABLE SCAN means no index is used and every row of table is scanned by database to find data, on the other hand INDEX UNIQUE SCAN or INDEX RANGE SCAN suggest use of Index  for finding data. By the Index also has there own disadvantage as they make INSERT and UPDATE query slower and they also need space. A careful use of index is the best way to go. 

Clustered vs Non Clustered Index in SQL

Now we have some idea about what is Index in database and how they work, it's time to look some key differences between clustered and non clustered index in SQL Server, which is mostly true for other database as well e.g. Oracle or MySQL.

1) One of the main difference between clustered and non clustered index in SQL Server is that, one table can only have one clustered Index but It can have many non clustered index, approximately 250. This limitation comes from the fact clustered index is used to determines how data is stored physically in table. You should be very careful while choosing clustered index and should use columns which can be queried in range e.g. select * from Employee where EMP_ID > 20 and EMP_ID < 50. Since clustered index stores data in cluster, related data are stored together and it's easy for database to retrieve all data in one shot. This further reduces lots of disk IO which is very expensive operation. Clustered Index is also very good on finding unique values in a table e.g. queries like select * from Employee where EMP_ID=40; can be very fast if EMP_ID has clustered index  on it.

2) Another key difference between Clustered Index and Non Clustered Index in database is that many relational database including SQL Server  by default creates clustered index on PRIMARY KEY constraint, if there is no clustered index exists in database and a nonclustered index is not specified while declaring PRIMARY KEY constraint.

3) One more difference between them is that, clustered index contains data i..e rows in there leaf node, as Index is represented as BST, while nonclustered index contains pointer to data (address or rows) in there leaf node, which means one more extra step to get the data. 

4) By the way there is a misconception that we can only define clustered index with one column, which is not true. You can create clustered index with multiple columns, known as composite index. For example in Employee table, a composite index on firstname and lastname can be a good clustered index, because most of the query uses this as criterion. Though you should try to minimize number of columns in clustered index for better performance in SQL Server. On related not, while declaring composite index, pay some attention to the order of columns in index, that can decide which statement will use index and which will not. In fact this is one of the mostly asked question as, does order of columns in composite index matters. 

Last but not the least, pay some attention while creating clustered  and non clustered index in database. Create clustered index for columns which contains unique values, are accessed sequentially, used in range queries and return large result set. Avoid creating clustered index on columns, which are update frequently because that would lead rearrangement of rows on disk level, a potentially slow operation. 

That's all on difference between clustered and nonclustered index in SQL Server database. Remember that, it's possible to create clustered index on non PRIMARY KEY column and PRIMARY KEY constraint only  creates a clustered index, if there is not already in database and a nonclustered index is not provided. Key difference is that, clustered index decides physical sorting or order of data in disk.

Further Learning
Introduction to SQL
The Complete SQL Bootcamp
SQL for Newbs: Data Analysis for Beginners


Anonymous said...

When you say that in clustered indices, leaves of BST contain data, are those data whole records of a table, or just values of column(s) we've built clustered index on?

Gautam said...

@Anonymous, SQL Server stores data in pages with size of 8KB. Leaf node actually contains these pages in case of clustered index, but in case of nonclustered index they contain address of these pages. Columns which are used to build indexes also known as key are stored in different data structure.

Anonymous said...

Thanks for the great article! Very helpful!

Post a Comment