Pagination in LEFT JOINed queries are not impossible
Using LEFT JOIN in SQL means you may get duplicate instances of a column value across rows, which makes pagination difficult. But does it have to be?
TL;DR: By using the DENSE_RANK()
window function, we can get the correct number of users on a page.
Let's consider the following data:
Users table
id | user_name |
---|---|
1 | John Smith |
2 | Jane Doe |
3 | Bob Johnson |
4 | Alice Brown |
5 | Charlie Wilson |
Courses table
id | name |
---|---|
1 | Introduction to SQL |
2 | Advanced Database Design |
3 | Data Analytics |
4 | Web Development |
5 | Machine Learning |
User_Courses table
user_id | course_id |
---|---|
1 | 1 |
1 | 2 |
1 | 3 |
2 | 1 |
2 | 4 |
3 | 1 |
3 | 2 |
3 | 3 |
3 | 5 |
4 | 4 |
We want to be able to retrieve the user<>course pair in a paginated manner.
Let's start with the most basic query:
SELECT
u.id as user_id,
u.user_name,
c.id as course_id,
c.name as course_name
FROM users u
LEFT JOIN User_Courses uc ON u.id = uc.user_id
LEFT JOIN Courses c ON uc.course_id = c.id
We're using a LEFT JOIN to ensure that we get all users, even if they don't have any courses.
This will return the following data:
The duplicate user_id values are due to the fact that a user can have multiple courses, but this can be a problem when we want to paginate the results.
Let's say we want to paginate the results by 2 rows per page. Whether we do it with a LIMIT OFFSET
query:
SELECT
u.id as user_id,
u.user_name,
c.id as course_id,
c.name as course_name
FROM users u
LEFT JOIN User_Courses uc ON u.id = uc.user_id
LEFT JOIN Courses c ON uc.course_id = c.id
LIMIT 2 OFFSET 0
or a query using the ROW_NUMBER()
window function:
SELECT * FROM (
SELECT
u.id as user_id,
u.user_name,
c.id as course_id,
c.name as course_name,
ROW_NUMBER() OVER (
ORDER BY
u.id DESC
) As RowNum
FROM users u
LEFT JOIN User_Courses uc ON u.id = uc.user_id
LEFT JOIN Courses c ON uc.course_id = c.id
) as t
WHERE
RowNum > 0
AND RowNum <= 2
As you can see, we're getting duplicate user_id values, meaning that when grouped by user_id, we'll only get 1 user, when we want 2 users on a page.
Using DENSE_RANK() to fix the issue
To fix this, we can use the DENSE_RANK()
window function to get the correct number of users on a page with query:
SELECT
DISTINCT user_id,
user_name,
course_id,
name,
RankNum
FROM (
SELECT
u.id as user_id,
u.user_name,
c.id as course_id,
c.name,
DENSE_RANK() OVER (
ORDER BY
u.id ASC
) As RankNum
FROM users u
JOIN user_courses uc on uc.user_id = u.id
JOIN courses c on c.id = uc.course_id
) AS ConstrainedResult
WHERE
RankNum > 0
AND RankNum <= 2;
which will return the following result:
This is the correct expected result showing 2 users on a page.
Performance Considerations of DENSE_RANK()
On queries working on large tables, you want to be mindful of the performance of DENSE_RANK()
. In such cases, I've found it useful to make two queries:
- Retrieve only the user_id values
SELECT
DISTINCT user_id
FROM (
SELECT
u.id as user_id,
DENSE_RANK() OVER (
ORDER BY
u.id ASC
) As RankNum
FROM users u
JOIN user_courses uc on uc.user_id = u.id
JOIN courses c on c.id = uc.course_id
) AS ConstrainedResult
WHERE
RankNum > 0
AND RankNum <= 2;
- Expand those user_id values with all the fields I need
SELECT
u.id as user_id,
u.user_name,
c.id as course_id,
c.name
FROM users u
JOIN user_courses uc on uc.user_id = u.id
JOIN courses c on c.id = uc.course_id
WHERE u.id IN (
@ids_from_first_query
)
Or combine the two queries into a single query:
SELECT
u.id as user_id,
u.user_name,
c.id as course_id,
c.name
FROM users u
JOIN user_courses uc on uc.user_id = u.id
JOIN courses c on c.id = uc.course_id
WHERE u.id IN (
SELECT
DISTINCT user_id
FROM (
SELECT
u.id as user_id,
DENSE_RANK() OVER (
ORDER BY
u.id ASC
) As RankNum
FROM users u
JOIN user_courses uc on uc.user_id = u.id
JOIN courses c on c.id = uc.course_id
) AS ConstrainedResult
WHERE
RankNum > 0
AND RankNum <= 2
)
This will return the following result:
Conclusion
Using LEFT JOIN in SQL means you may get duplicate instances of a column value across rows, which makes pagination difficult. But does it have to be?
By using the DENSE_RANK()
window function, we can get the correct number of users on a page.
Do you know any alternative solutions to this problem? I'd love to hear them in the comments below.
I've placed all SQL queries in a GitHub Gist below. Feel free to paste the queries in SQLite Online to test them out.