3 important Leetcode SQL questions (Level: medium) I solved — Part I
After solving almost all easy Leetcode problems, I dived deeper into applying logical thinking to solve SQL queries by tackling medium problems. These problems are usually a combination of 2 easy problems where we have to apply 2–3 concepts at the same time. Some of the questions were really tricky and it was hard to think of an optimized approach in the first go.
Question 1: Find the Start and End Number of Continuous Ranges
Understanding this question conceptually helps in a lot of other questions. This is a classic example of the Gaps and Islands problem where we want to find the islands.
We want to find the start and end number of continuous ranges in a table.
Input:
Logs table:
+------------+
| log_id |
+------------+
| 1 |
| 2 |
| 3 |
| 7 |
| 8 |
| 10 |
+------------+
Output:
+------------+--------------+
| start_id | end_id |
+------------+--------------+
| 1 | 3 |
| 7 | 8 |
| 10 | 10 |
+------------+--------------+
Looking at the table, it is noticeable we want to find out the minimum and maximum of contiguous ranges. The minimum value will become start_id and the maximum value will become end_id. All we need to do is group ranges with contiguous numbers. So 1,2,3 should be in the 1st group, 7 and 8 should be in the 2nd group and so on.
If we assign row numbers to each row and subtract that from log_id, we can see log_id 1,2,3 will have a difference of 0, log_id 7 and 8 will have 3 as the difference and 10 will have 4 as the difference which puts those numbers in different groups (that is exactly what we wanted).
Now, we can just use our new grouping labels to find the minimum and maximum of that group.
There are other solutions for this using JOINS, WHERE conditions but this one is simple enough to understand and efficient as well.
Question 2: Department Highest Salary
These types of questions usually involve the use of analytical window functions like RANK(), DENSE_RANK() or ROW_NUMBER() when we need to rank rows based on some ordering criteria and select the top N rows from each group. They are usually asked in product-based companies' interviews to find statistics related to orders.
This question wants us to find employees who have the highest salary in each of the departments. If there are multiple employees in a department with the same salary, we need to include all of them in the output (This is important!).
Input:
Employee table:
+----+-------+--------+--------------+
| id | name | salary | departmentId |
+----+-------+--------+--------------+
| 1 | Joe | 70000 | 1 |
| 2 | Jim | 90000 | 1 |
| 3 | Henry | 80000 | 2 |
| 4 | Sam | 60000 | 2 |
| 5 | Max | 90000 | 1 |
+----+-------+--------+--------------+
Department table:
+----+-------+
| id | name |
+----+-------+
| 1 | IT |
| 2 | Sales |
+----+-------+
Output:
+------------+----------+--------+
| Department | Employee | Salary |
+------------+----------+--------+
| IT | Jim | 90000 |
| Sales | Henry | 80000 |
| IT | Max | 90000 |
+------------+----------+--------+
RANK() gives the same rank to the rows with the same values and then skips that many ranks. DENSE_RANK() doesn't create any gaps between the ranks. ROW_NUMBER() assigns a sequential integer number to each row.
Since we want to return all employees in case of a tie, we need to use RANK or DENSE_RANK. PARTITION BY clause will define windows over each department.
Now, we can just select rows where the rank is 1. For this question, since we only want the highest salaried employee, we can also use DENSE_RANK.
We can approach this question using GROUP BY as well. First, we will find the highest salary for each department using GROUP BY and then JOIN that salary and department_id to the employee table to find the employee name.
Question 3: Number of Calls Between Two Persons
This is an interesting question that has a very simple solution.
The question asks us to report the number of calls and the total call duration between each pair of distinct persons (person1, person2)
where person1 < person2
.
It is important to note that person1 < person2
in the final output.
Input:
Calls table:
+---------+-------+----------+
| from_id | to_id | duration |
+---------+-------+----------+
| 1 | 2 | 59 |
| 2 | 1 | 11 |
| 1 | 3 | 20 |
| 3 | 4 | 100 |
| 3 | 4 | 200 |
| 3 | 4 | 200 |
| 4 | 3 | 499 |
+---------+-------+----------+
Output:
+---------+---------+------------+----------------+
| person1 | person2 | call_count | total_duration |
+---------+---------+------------+----------------+
| 1 | 2 | 2 | 70 |
| 1 | 3 | 1 | 20 |
| 3 | 4 | 4 | 999 |
+---------+---------+------------+----------------+
In the input table, we have a call from 1 to 2 and a call from 2 to 1. Those are 2 calls between 1 and 2 for a total of 70 minutes. If we look at the second row in the input table, if we can somehow bring 1 in from_id and 2 in to_id, then it is a simple GROUP BY problem.
CASE WHEN takes in values, checks them against a condition and THEN outputs values into a new column based on if it satisfies the condition.
In the first row, from_id < to_id is true, therefore from_id will be picked as person1 and to_id will be picked as person2. In the second row, from_id < to_id is false, hence to_id is picked as person1 and from_id is picked as person2 and similarly for other rows.
Once we have got this, we can use GROUP BY and aggregate functions to get the desired output.
Thank you for reading. Please post in the comments if you have an alternate/better solution for these problems.