Thursday, August 5, 2021

How to write SubQuery in SQL? Correlated vs Noncorrelated Sub Query Example

Hello guys, if you are struggling to write a sub query in SQL and need some guidance then you have come to the right place. In this article, I will not only teach you what is subquery, how it works, and which problem it solves but also teach you how to write both correlated and non-correlated sub-queries with real-world examples. SubQuery in SQL is a query inside another query. Some time to get particular information from a database you may need to fire two separate SQL queries, subQuery is a way to combine or join them in a single query. SQL query which is on the inner part of the main query is called the inner query while the outer part of the main query is called the outer query.

For example in below SQL query

SELECT name FROM City WHERE pincode IN (SELECT pincode FROM pin WHERE zone='west')

section not highlighted is OUTER query while section highlighted with grey is INNER query.

In this SQL tutorial, we will see both Correlated and non-correlated sub-query and their examples, some differences between correlated and noncorrelated subqueries, and finally, subquery vs join which is a classic debatable topic in SQL.

By the way, this SQL tutorial is next in series of SQL and database articles in Javarevisited like truncate vs delete and 10 examples of  SELECT queries. If you are new here then you may find those examples interesting.

If you are new to the SQL world, it's better to start with an online SQL course. That will help you to learn SQL better and quicker, and these kinds of articles will also make more sense once you have some SQL knowledge under your belt.

SubQuery Rules in SQL

Like any other concept in SQL, subquery also has some rules and you can only embed one query inside another by following rules :
1. subquery can be used in the insert statement.
2. subquery can be used in the select statement as a column.
3. subquery should always return either a scalar value if used with where clause or value from a column if used with IN or NOT IN clause.



Before going to understand the non-correlated and correlated subquery, let’s see the table and data which we are going to use in this example. Until you have an understanding of how the table looks like and what kind of data it stores it's a little difficult to understand queries. In this subquery example, we will use two tables Stock and Market. Stock holds different stocks and the Market holds all stock exchanges in the world.

mysql> select * from stock;
+---------+-------------------------+--------------------+
| RIC     | COMPANY                 | LISTED_ON_EXCHANGE |
+---------+-------------------------+--------------------+
| 6758.T  | Sony                    | T                  |
| GOOG.O  | Google Inc              | O                  |
| GS.N    | Goldman Sachs Group Inc | N                  |
| INDIGO  | INDIGO Airlines         | NULL               |
| INFY.BO | InfoSys                 | BO                 |
| VOD.L   | Vodafone Group PLC      | L                  |
+---------+-------------------------+--------------------+
6 rows in set (0.00 sec)

mysql> select  from Market;
+------+-------------------------+---------------+
| RIC  | NAME                    | COUNTRY       |
+------+-------------------------+---------------+
| T    | Tokyo Stock Exchange    | Japan         |
| O    | NASDAQ                  | United States |
| N    | New York Stock Exchange | United States |
| BO   | Bombay Stock Exchange   | India         |
+------+-------------------------+---------------+
4 rows in set (0.00 sec)


Noncorrelated subquery in SQL

There are two kinds of subquery in SQL one is called non-correlated and the other is called a correlated subquery. In a non-correlated subquery, the inner query doesn't depend on the outer query and can run as a stand-alone query. Subquery used along with IN or NOT IN SQL clause is a good example of Noncorrelated subquery in SQL. Let's a noncorrelated subquery example to understand it better.


Difference between correlated and noncorrelated suquery


NonCorrelated Subquery Example:

Let’s see the query  “Find all stocks from Japan”, If we analyze this query we know that stock names are stored in the Stock table while Country name is stored in the Market table, so we need to fire two queries first to get RIC for the Japanese market and then all stocks which are listed on that Market. we can combine these two queries into one SQL query by using subquery as shown in the below example:

mysql> SELECT COMPANY FROM Stock
       WHERE LISTED_ON_EXCHANGE = (SELECT RIC FROM Market WHERE COUNTRY='Japan');
+---------+
| COMPANY |
+---------+
| Sony    |
+---------+
1 row IN SET (0.02 sec)

Here part which is inside bracket is called inner query or subquery. As you see in this example of subquery, inner query can run alone and its not depended on outer query and that's why its called NonCorrelated query.



NonCorrelated Subquery Example with IN Clause SQL

NonCorrelated subquery are used along-with IN and NOT IN clause. here is an example of subquery with IN clause in SQL.
SQL query: Find all stocks from United States and India

mysql> SELECT COMPANY FROM Stock WHERE LISTED_ON_EXCHANGE IN (SELECT RIC FROM Market WHERE COUNTRY='United States' OR COUNTRY= 'INDIA');
+-------------------------+
| COMPANY                 |
+-------------------------+
| Google Inc              |
| Goldman Sachs GROUP Inc |
| InfoSys                 |
+-------------------------+

When Subquery is used along-with IN or NOT IN Clause it returns result from one column instead of Scaler value.



Correlated SubQuery in SQL

Correlated subqueries are the one in which inner query or subquery reference outer query. Outer query needs to be executed before inner query. One of the most common example of correlated subquery is using keywords exits and not exits. An important point to note is that correlated subqueries are slower queries and one should avoid it as much as possible.

Example of Correlated Subquery in SQL

Here is an example of Correlated subquery “Return all markets which has at least one stock listed on it.”

mysql> SELECT m.NAME FROM Market m WHERE m.RIC = (SELECT s.LISTED_ON_EXCHANGE FROM Stock s WHERE s.LISTED_ON_EXCHANGE=m.RIC);

+-------------------------+
| NAME                    |
+-------------------------+
| Tokyo Stock Exchange    |
| NASDAQ                  |
| New York Stock Exchange |
| Bombay Stock Exchange   |
+-------------------------+
4 rows IN SET (0.00 sec)

Here inner query will execute for every Market as RIC will be changed for every market.



Difference between Correlated and NonCorrelated Subquery

Now we have seen correlated and noncorrelated subqueries and their example it much easier to understand the difference between correlated vs noncorrelated queries. By the way, this is also one of the popular SQL interview questions, and its good to know few differences:

1.In the case of correlated subquery inner query depends on the outer query while in the case of a noncorrelated query inner query or subquery doesn't depend on outer query and is run on its own.

2.In the case of the correlated subquery, the outer query is executed before the inner query or subquery while in the case of the NonCorrelated subquery inner query executes before the outer query.

3.Correlated Sub-queries are slower than non-correlated subqueries and should be avoided in favor of SQL joins.

4.Common example of the correlated subquery is using exits and not exists keyword while non-correlated query mostly uses IN or NOT IN keywords.

SubQuery vs Join in SQL

Any information which you retrieve from the database using subquery can be retrieved by using different types of joins also. SQL is flexible and it provides different ways of doing the same thing. Some people find SQL Joins confusing and subquery specially noncorrelated more intuitive but in terms of performance SQL Joins are more efficient than subqueries.

Important points about SubQuery in DBMS

1.Almost whatever you want to do with subquery can also be done using join, it is just a matter of choice subquery seems more intuitive to many users.

2.Subquery normally returns a scaler value as a result or result from one column if used along with IN Clause.

3.You can use subqueries in four places: subquery as a column in select clause,

4.In the case of correlated subquery outer query gets processed before the inner query.

That's all about subquery in SQL. It's an important concept to learn and understand, as both correlated and non-correlated subquery is essential to solve SQL query-related problems. They are not just important from the SQL interview point of view but also from the Data Analysis point of view. If you want to become a programmer or Data scientist then you should learn and master SQL concepts like correlated subqueries. 


Other SQL and Database related articles from Javarevisited :

11 comments:

  1. This comment has been removed by the author.

    ReplyDelete
  2. Sorry but it's not correct. EXISTS clauses on statements are faster than the noncorrelates that you are refering to.

    ReplyDelete
  3. It is possible to use subqueries that return multiple values per row. Example of a valid query:
    SELECT name FROM City WHERE (pincode,timezone) IN (SELECT pincode, timezone FROM pin WHERE zone='west');

    ReplyDelete
  4. Use "=" , when subquery return single value on (single column)
    Use "IN or NOT IN" clause, when subquery return multiple value on (single column)
    Use "EXISTS or NOT" EXISTS clause, when subquery return multiple value on (multiple column)

    ReplyDelete
  5. "One of the most common example of correlated subquery is using keywords exits and not exits" In these lines it should be EXISTS and NOT EXISTS, you miss one of the "S" there.

    ReplyDelete
  6. One of the best example of correlated sub query is to to write sql query find the second maximum number in a table e.g. finding second largest salary in employee table. Look at below subquery, its a correlated because inner query is evaluated every time a new row is processed by outer query.

    SELECT salary FROM EMP e1 WHERE 2 = (SELECT COUNT(DISTINCT salary) FROM EMP e2 WHERE e2.salary >= e1.salary )

    You can use this query to find second, third, fourth or nth highest salary, just change the number after WHERE clause, e.g. for third highest salary put WHERE 3 = (result of correlated subquery).

    How does this query will work?

    When first row will be processed by outer query, inner query will find how many rows exists where salary is greater than current salary, if it was maxium then only correlated sub query will return 1, if the current salry is second largest then inner query will return 2 and if the current rows's salary is third largest then inner query will return 3.

    Why this is an example of correlated query?
    because inner query references outer query for comparing salary, here is the inner query portion :

    SELECT COUNT(DISTINCT salary) FROM EMP e2 WHERE e2.salary >= e1.salary

    you can see that it is using "e1.salary" which is from outer query.

    ReplyDelete
  7. This is a good interview question to judge knowledge of advanced SQL developer. Regarding, what is the difference between correlated and uncorrelated subquery, here are my 2 cents :

    1) You can execute an uncorrelated subquery independently but you cannot execute correlated subquery independently. SQL engine will complain about missing table reference for example in following non-correlated subquery, you can execute inner quey independently :

    select * from Employee where salary = (select MAX(salary) from Employee)

    here select * from Employee where salary is outer query and (select MAX(salary) from Employee) is inner query, which can be executed independtly.

    On the other hand, if you try to exeucte inner query in case of following correlated subquery, you will get missing table reference error

    SELECT MAX(Number) FROM Numbers n1 WHERE 2 = (SELECT CONT(DISTINCT Number) from Numbers WHERE n2.number>=n1.number).

    here inner query is (SELECT CONT(DISTINCT Number) from Numbers WHERE n2.number>=n1.number), which is correlated because it is reffering to table alias n1, which is part of outer query. if you try to run this query independently, you will get compile time error.

    2) Another significant differnece between these two is that uncorrelated subquery execute only once irrespective of how many rows in outer table, but correlated query execute for each reocrd processed by outer query.

    3) Due to above reason, correlated query is not very efficient compared to normal subqueries.

    4) You can use correlated queries with EXISTS and NOT EXISTS clause to search based upon value provided by outer query.

    Since, correlated sub query is not very efficient, you should use it only when you need single result. Using it on large table with so many rows will make it run slower.

    ReplyDelete
  8. Does the non-correlated subquery is same as nested query? I am looking for answer of question, what is the difference between correlated and nested subquery in SQL and someone pointed me to this URL, but I am not sure if non-correlated actually means nested query. Please suggest.

    Also can you please show me an example of how to write SQL query with EXISTS and NOT EXISTS clause? I have a table which stores information about item and stores, I need to find all the item which exists on one store but not on other e.g. all items which exists in store A but not exists in store B.

    Thanks in advance
    Gurang

    ReplyDelete
  9. This article is so much confusing regarding Correlated and Non Correlated SubQuery.

    ReplyDelete
  10. create table einfo(eid int, fname varchar(20), lname varchar(20));
    create table salary(eid int, salary number(19,4));

    insert into einfo values(1,'john','austin');
    insert into einfo values(2,'tom','moore');
    insert into einfo values(3,'larry','soone');
    insert into einfo values(4,'hank','rowe');
    insert into einfo values(5,'frank','orpe');

    insert into salary values(1, 10000);
    insert into salary values(2, 20000);
    insert into salary values(3, 30000);
    insert into salary values(4, 40000);
    insert into salary values(5, 50000);
    commit;

    select fName, lName from EINFO e where e.eid in (select s.eid from salary s where e.eid > 3);


    your example is AWFUL.
    please take my example and use it.
    MIT license.

    ReplyDelete
  11. hi. thanks for your article.
    there was a typo in `SubQuery vs Join in SQL` section in the first line and i suppose it should be '...using different types of' instead of 'different types os'.

    ReplyDelete