Thursday, May 11, 2023

How to Find Duplicate values in a Table? SQL GROUP BY and Having Example| Leetcode Solution

Hello guys, if you are wondering how to find duplicate values in a table then don't worry, there are many ways to find duplicate rows or values from a given table. For example, you can use the GROUP BY and HAVING clause in SQL with count function to find all the rows which has same values for a particular column and then filter out rows where count is just one, I mean unique values. This way you can find all the duplicate from a given table. Using group by you can create groups and if your group has more than 1 element it means it's kind of duplicate. For example, you need to write a SQL query to find all duplicate emails in a table named Person. This is a popular SQL Query interview question as well as a Leetcode problem. You can see that email a@b.com is a duplicate email as it appears twice in the table. 

You need to write a query to find all duplicate values, I mean emails in this case. 

+----+---------+
| Id | Email   |
+----+---------+
| 1  | a@b.com |
| 2  | c@d.com |
| 3  | a@b.com |
+----+---------+

For example, your query should return the following for the above table:

+---------+
| Email   |
+---------+
| a@b.com |
+---------+

Note: All emails are in lowercase.



SQL query to find duplicate values in a Column - Solution

Here are three ways to solve this problem in SQL query, first by using group by clause, second by using self-join, and then third by using subquery with exists clause.  While I agree that this problem can be solved in a different way, but it is also a perfect example of how you can use the SQL GROUP BY and HAVING clause.

But, if you are new to the SQL world, it's better to start with a comprehensive SQL course like The Complete SQL Bootcamp course by Jose Portilla on Udemy. That will help you to learn SQL better and quicker, and these kinds of articles will also make more sense once you have some SQL knowledge under your belt.

SQL GROUP BY and HAVING Example - Write SQL Query to find Duplicate Emails - LeetCode Solution



1. Finding Duplicate elements By using GROUP BY

The simplest solution to this problem is by using the GROUP BY and HAVING Clause. Use GROUP BY to group the result set on email, this will bring all duplicate emails in one group, now if the count for a particular email is greater than 1 it means it is a duplicate email. Here is the SQL query to find duplicate emails :

# Write your MySQL query statement below
SELECT Email FROM Person 
GROUP BY Email 
HAVING COUNT(Email) > 1

This is also my accepted answer on LeetCode. You can see by using the count function you can count a number of elements in the group and if your group contains more than 1 row then it's a duplicate value that you want to print. 




2. Finding Duplicate values in a column By using Self Join

By the way, there are a couple of more ways to solve this problem, one is by using Self Join. If you remember, In Self Join we join two instances of the same table to compare one record to another. Now if an email from one record in the first instance of the table is equal to the email of another record in the second table it means the email is duplicate. Here is the SQL query using Self Join

# Write your MySQL query statement below
SELECT DISTINCT a.Email FROM Person a 
JOIN  Person b ON a.Email = b. Email 
WHERE a.Id != b.Id 

Remember to use the keyword distinct here because it will print the duplicate email as many times it appears in the table. This is also an accepted solution in Leetcode. If you want to learn more about how join works and how to use left and right out join, I suggest you join an SQL course that works with real data like SQL for Newbs: Data Analysis for Beginners on Udemy.

How to find Duplicate values in SQL




3. Finding duplicate emails By using Sub-query with EXISTS:

You can even solve this problem using a correlated subquery. In a correlated subquery, the inner query is executed for each record in the outer query. So one email is compared to the rest of the email in the same table using a correlated subquery and EXISTS clause in SQL as shown below. 

Here is the solution query :

SELECT DISTINCT p1.Email
FROM Person p1
WHERE EXISTS(
    SELECT *
    FROM Person p2
    WHERE p2.Email = p1.Email
    AND p2.Id != p1.Id
)

If you still need more guidance then joining a comprehensive SQL course like The Complete SQL Bootcamp course by Jose Portilla on Udemy is also a great way to understand how to find duplicate elements in a table using GROUP BY  with HAVING clause.

best course to learn SQL for beginners


That's all about how to find duplicates in SQL using the GROUP BY and HAVING clause. I have also shown you how you can solve this problem using Self-join and a subquery with the EXISTS clause as well. Once you get familiar with the pattern you can solve many such problems. If you want to learn more check out the following resources.


Other related SQL queries, Interview questions, and articles:
  • How to join three tables in one single SQL query (solution)
  • Write a SQL query to find all table names on a database in MySQL (solution)
  • 5 Courses to learn Database and SQL Better (courses)
  • The real difference between WHERE and HAVING clause in SQL? (answer)
  • Top 5 Courses to learn Microsoft SQL Server in-depth (courses)
  • How to migrate SQL queries from Oracle to SQL Server? (answer)
  • Top 5 Websites to learn SQL online for FREE? (resource)
  • What is the difference between UNION and UNION ALL in SQL? (answer)
  • Write a SQL query to copy or backup a table in MySQL (solution)
  • How do you find the duplicate rows in a table on a database? (solution)
  • 5 Advanced SQL books to level up your SQL skills (books)
  • 5 Courses to learn Oracle and Microsoft SQL Server database (courses)
  • Difference between Self and Equi Join in SQL? (answer)
  • 4 Free Books to learn Microsoft SQL Server database (books)
  • Top 5 Websites to learn SQL online for FREE? (websites)
  • 5 Free Courses to learn Database and SQL (free courses)
  • 5 Free Courses to learn Oracle and SQL Server? (courses)
  • Top 5 Courses to learn MySQL Database for Beginners (Courses)
  • What is the difference between View and Materialized View in Database? (answer)
  • Difference between clustered and non-clustered indexes in SQL? (answer)
  • Difference between Primary and Candidate key in the table? (answer)
  • 5 Free Courses to learn T-SQL and SQL Server for Beginners (Courses)
  • Difference between the Unique and Primary keys in the table? (answer)
  • Top 5 Courses to learn PostgreSQL in-depth (courses)
  • 5 Free Courses to learn T-SQL and SQL Server for Beginners (Courses)
  • Difference between Primary and Foreign keys in the table? (answer)

Thanks for reading this article, if you like this SQL Group by HAVING tutorial and how to find duplicate values for a particular column then please share it with your friends and colleagues. If you have any questions or feedback, then please drop a note.

P.S. - If you are interested in learning Database and SQL and looking for some free resources to start your journey, then you can also take a look at this list of Free SQL Courses for Beginners to kick-start your learning.

No comments:

Post a Comment