Saturday, September 12, 2020

Difference between row_number(), rank() and dense_rank() window functions in SQL

Though all three are ranking functions in SQL, also known as a window function in Microsoft SQL Server, the difference between rank(), dense_rank(), and row_number() comes when you have ties on ranking i.e. duplicate records. For example, if you are ranking employees by their salaries then what would be the rank of two employees of the same salaries? It depends on which ranking function you are using like row_number, rank, or dense_rank.

The row_number() function always generates a unique ranking even with duplicate records i.e. if the ORDER BY clause cannot distinguish between two rows, it will still give them different rankings, though which record will come earlier or later is decided randomly like in our example two employees Shane and Rick have the same salary and has row number 4 and 5, this is random, if you run again, Shane might come 5th.

The rank() and dense_rank() will give the same ranking to rows that cannot be distinguished by the order by clause, but dense_rank will always generate a contiguous sequence of ranks like (1,2,3,...), whereas rank() will leave gaps after two or more rows with the same rank (think "Olympic Games": if two athletes win the gold medal, there is no second place, only third).  You can also see the  Querying Microsoft SQL Server course on Udemy to learn more about how to rank and dense_rank break ties.

Surprisingly all these functions behave similarly in Microsoft SQL Server and Oracle, at least at the high level, so if you have used them in MSSQL, you can also use it on Oracle 11g or other versions.

By the way, if you are new to Microsoft SQL Server and T-SQL then I also suggest you join a comprehensive course to learn SQL Server fundamentals and how to work with T-SQL. If you need a recommendation then I suggest you go through the Microsoft SQL for Beginners online course by Brewster Knowlton on Udemy. It''s a great course to start with T-SQL and SQL queries in SQL Server.





SQL to build schema

Here is the SQL to create a table and insert some data into it for demonstration purpose:

IF OBJECT_ID( 'tempdb..#Employee' ) IS NOT NULL
DROP TABLE #Employee;

CREATE TABLE #Employee (name varchar(10), salary int);

INSERT INTO #Employee VALUES ('Rick', 3000);
INSERT INTO #Employee VALUES ('John', 4000);
INSERT INTO #Employee VALUES ('Shane', 3000);
INSERT INTO #Employee VALUES ('Peter', 5000);
INSERT INTO #Employee VALUES ('Jackob', 7000);
INSERT INTO #Employee VALUES ('Sid', 1000);

You can see that we have included two employees with the same salaries i.e. Shane and Rick, just to demonstrate the difference between row_number, rank, and dense_rank window function in the SQL server, which is obvious when there are ties in the ranking.

If you want to learn more about ranking function in SQL Server, I highly recommend this 70-461, 761: Querying Microsoft SQL Server with Transact-SQL course on Udemy. It's a great course to learn SQL Server in-depth and also become a certified SQL Server DBA.


Difference between row_number(), rank() and dense_rank() in SQL Server, Oracle.



ROW_NUMBER() Example

It always generates a unique value for each row, even if they are the same and the ORDER BY clause cannot distinguish between them. That's why it is used to solve problems like the second-highest salary or nth highest salary, we have seen earlier.

In the following example, we have two employees with the same salary and even though we have generated row numbers over the salary column it produces different row numbers for those two employees with the same salary.

select e.*, row_number() over (order by salary desc) row_number from #Employee e
result:
name    salary  row_number
Jackob  7000    1
Peter   5000    2
John    4000    3
Shane   3000    4
Rick    3000    5
Sid     1000    6

You can see in this example that we have ranked employees based upon their salaries and each of them has a unique rank even if their salaries are the same e.g. Shane and Rick have the same salary of 3000 but they got the unique rank 4th and 5th. It's worth knowing that in the case of a tie, ranks are assigned on a random basis, see Oracle Analytic Functions In-Depth & Advanced Oracle SQL course on Udemy to learn more about when to use the row_number() function in Oracle database.

Difference between row_number(), rank() and dense_rank() in SQL Server



RANK() Example

The rank() function will assign the same rank to the same values i.e. which are not distinguishable by ORDER BY. Also, the next different rank will not start from immediately next number but there will be a gap i.e. if 4th and 5th employees have the same salary then they will have the same rank 4, and 6th employee which has a different salary will have a new rank 6.

Here is the example to clarify the point:

select e.*, rank() over (order by salary desc) rank from #Employee e
result:
name    salary  rank
Jackob  7000    1
Peter   5000    2
John    4000    3
Shane   3000    4
Rick    3000    4
Sid     1000    6

You can see that both Shane and Rick have got the same rank 4th, but the Sid got the rank 6th, instead of 5 because it keeps the original ordering. If you want to learn more about Window Functions you can also check out The Complete SQL + Databases Bootcamp: Zero to Mastery [2020] course by ZTM Academy and Andrei, one of my favorite instructor. This is a comprehensive course to cover everything in SQL you need to know.

best SQL Courses for Beginners

Btw, you would need a ZTM membership to watch this course which costs around $29 per month but also provides access to many super engaging and useful courses like his Python course and JavaScript interview course.  If you are a constant learner like me then I suggest you go for membership than buying a single course, you will not only save money but also it makes learning easy, as you don't need to buy courses every time you want to learn something new.


DENSE_RANK() Example

The dense_rank function is similar to the rank() window function i.e. same values will be assigned the same rank, but the next different value will have a rank which is just one more than the previous rank, i.e. if 4th and 5th employee has the same salary then they will have the same rank but 6th employee, which has different salary will have rank 5, unlike rank 6 as is the case with rank() function. There will be no gap in ranking in the case of dense_rank() as shown in the following example:

select e.*, dense_rank() over (order by salary desc) dense_rank from #Employee e
name    salary  dense_rank
Jackob  7000    1
Peter   5000    2
John    4000    3
Shane   3000    4
Rick    3000    4
Sid     1000    5

You can see that both Shane and Rick have the same ranking 4th, but Sid now has 5th rank which is different than 6th in the earlier example when we used the rank() function. Btw, if you are serious about master SQL, I strongly suggest reading Joe Celko's SQL for Smarties, one of the more advanced books in SQL.

rank vs dense_rank vs row_number in SQL Server and Oracle



Difference between row_number vs rank vs dense_rank

As I told, the difference between rank, row_number, and dense_rank is visible when there are duplicate records. Since in all our example we are ranking records on salary, if two records will have the same salary then you will notice the difference between these three ranking functions.

The row_number gives continuous numbers, while rank and dense_rank give the same rank for duplicates, but the next number in rank is as per continuous order so you will see a jump but in dense_rank doesn't have any gap in rankings.

-- difference between row_number(), rank(), and dense_rank()
-- will only visible when there were duplicates.
-- row_number gives consecutive ranking even with duplicate
-- rank and dense_rank give the same ranking but rank has a jump
-- while dense_rank doesn't have jump

select e.*,
row_number() over (order by salary desc) row_number, 
rank() over (order by salary desc) rank,
dense_rank() over (order by salary desc) as dense_rank 
from #Employee e

and here is the output which clearly shows the difference in the ranking generated by rank() and dense_rank() function. This will clear your doubt about rank, desnse_rank, and row_nubmer function, but if you want to learn more check out 70-461, 761: Querying Microsoft SQL Server with Transact-SQL course on Udemy.

Difference between rank and dense_rank and row_number in Java

You can see the employees Shane and Rick have the same salary 3000 hence their ranking is the same when you use the rank() and dense_rank() but the next ranking is 6 which is as per continuous ranking using rank() and 5 when you use dense_rank(). The row_number() doesn't break ties and always gives a unique number to each record.

Btw, I ran all three SQL queries on Oracle 11g R2 and, Oracle 12c and it gave me the same result. So, it seems both Oracle and SQL Server support these functions and they behave identically.


That's all about the difference between ROW_NUMBER(), RANK(), and DENSE_RANK() function in SQL SERVER. As I told, the difference boils down to the fact when ties happen. In the case of the tie, ROW_NUMBER() will give unique row numbers, the rank will give the same rank, but the next different rank will not be in sequence, there will be a gap.

In the case of dense_rank, both rows in the tie will have the same rank and there will be no gap. The next different rank will be in sequence.

Further Learning
Introduction to SQL
The Complete SQL Bootcamp
SQL for Newbs: Data Analysis for Beginners


Other SQL tutorials and Articles you may find interesting
  • 21 Frequently asked SQL Queries from Interviews (read here)
  • The difference between WHERE and HAVING clause in SQL? (answer)
  • What is the difference between UNION and UNION ALL in SQL? (answer)
  • 5 Websites to learn SQL for FREE (websites)
  • What is the difference between LEFT and RIGHT OUTER JOIN in SQL? (answer)
  • 5 Free Courses to Learn MySQL database (courses)
  • What is the difference between View and Materialized View in Oracle? (answer)
  • My favorite courses to learn SQL and Database (courses)
  • What is the difference between self and equijoin in SQL? (answer)
  • 5 Free Courses to learn Database and SQL (courses)
  • The difference between TRUNCATE and DELETE in SQL? (answer)
  • 5 Books to Learn SQL Better (books)
  • How to join more than two tables in a single query (article)
  • 10 Free Courses to learn Oracle and SQL Server (courses)
  • Top 5 Courses to learn MySQL for Beginners (courses)
  • What is the difference between Primary and Foreign keys in a table? (answer)
  • How to find all customers who have never ordered? (solution)
  • How to remove duplicate rows from a table in SQL? (solution)
  • What is the difference between close and deallocate a cursor? (answer)
  • Top 5 Courses to learn Microsoft SQL Server in-depth (courses)
  • 10 Free Courses to learn Oracle and Microsoft SQL Server (free courses)

Thanks for reading this article and let me know how do you write SQL queries? which style you use, or you have your own style?

P. S. - If you are looking for a free course to start learning SQL and Database basics then I suggest you go through Introduction to Databases and SQL Querying course on Udemy. It's completely free, all you have to create a Udemy account and you can access the whole course.

No comments :

Post a Comment