mykeels.com

Pagination in LEFT JOINed queries are not impossible

Using LEFT JOIN in SQL means you may get duplicate instances of a column value across rows, which makes pagination difficult. But does it have to be?

Pagination in LEFT JOINed queries are not impossible

Using LEFT JOIN in SQL means you may get duplicate instances of a column value across rows, which makes pagination difficult. But does it have to be?

TL;DR: By using the DENSE_RANK() window function, we can get the correct number of users on a page.

Cover image

Let's consider the following data:

Users table

iduser_name
1John Smith
2Jane Doe
3Bob Johnson
4Alice Brown
5Charlie Wilson

Courses table

idname
1Introduction to SQL
2Advanced Database Design
3Data Analytics
4Web Development
5Machine Learning

User_Courses table

user_idcourse_id
11
12
13
21
24
31
32
33
35
44

We want to be able to retrieve the user<>course pair in a paginated manner.

Let's start with the most basic query:

SELECT
  u.id as user_id,
  u.user_name,
  c.id as course_id,
  c.name as course_name
FROM users u
LEFT JOIN User_Courses uc ON u.id = uc.user_id
LEFT JOIN Courses c ON uc.course_id = c.id

We're using a LEFT JOIN to ensure that we get all users, even if they don't have any courses.

This will return the following data:

Basic query result

The duplicate user_id values are due to the fact that a user can have multiple courses, but this can be a problem when we want to paginate the results.

Let's say we want to paginate the results by 2 rows per page. Whether we do it with a LIMIT OFFSET query:

SELECT
  u.id as user_id,
  u.user_name,
  c.id as course_id,
  c.name as course_name
FROM users u
LEFT JOIN User_Courses uc ON u.id = uc.user_id
LEFT JOIN Courses c ON uc.course_id = c.id
LIMIT 2 OFFSET 0

or a query using the ROW_NUMBER() window function:

SELECT * FROM (
    SELECT
    u.id as user_id,
    u.user_name,
    c.id as course_id,
    c.name as course_name,
    ROW_NUMBER() OVER (
        ORDER BY
            u.id DESC
    ) As RowNum
    FROM users u
    LEFT JOIN User_Courses uc ON u.id = uc.user_id
    LEFT JOIN Courses c ON uc.course_id = c.id
) as t
WHERE
    RowNum > 0
    AND RowNum <= 2

Pagination result

As you can see, we're getting duplicate user_id values, meaning that when grouped by user_id, we'll only get 1 user, when we want 2 users on a page.

Using DENSE_RANK() to fix the issue

To fix this, we can use the DENSE_RANK() window function to get the correct number of users on a page with query:

SELECT
  DISTINCT user_id,
  user_name,
  course_id,
  name,
  RankNum
FROM (
  SELECT
    u.id as user_id,
    u.user_name,
    c.id as course_id,
    c.name,
    DENSE_RANK() OVER (
        ORDER BY
            u.id ASC
    ) As RankNum
  FROM users u
  JOIN user_courses uc on uc.user_id = u.id
  JOIN courses c on c.id = uc.course_id
) AS ConstrainedResult
WHERE
    RankNum > 0
    AND RankNum <= 2;

which will return the following result:

Pagination with DENSE_RANK result

This is the correct expected result showing 2 users on a page.

Performance Considerations of DENSE_RANK()

On queries working on large tables, you want to be mindful of the performance of DENSE_RANK(). In such cases, I've found it useful to make two queries:

  1. Retrieve only the user_id values
SELECT
  DISTINCT user_id
FROM (
  SELECT
    u.id as user_id,
    DENSE_RANK() OVER (
        ORDER BY
            u.id ASC
    ) As RankNum
  FROM users u
  JOIN user_courses uc on uc.user_id = u.id
  JOIN courses c on c.id = uc.course_id
) AS ConstrainedResult
WHERE
    RankNum > 0
    AND RankNum <= 2;
  1. Expand those user_id values with all the fields I need
SELECT
  u.id as user_id,
  u.user_name,
  c.id as course_id,
  c.name
FROM users u
JOIN user_courses uc on uc.user_id = u.id
JOIN courses c on c.id = uc.course_id
WHERE u.id IN (
    @ids_from_first_query
)

Or combine the two queries into a single query:

SELECT
  u.id as user_id,
  u.user_name,
  c.id as course_id,
  c.name
FROM users u
JOIN user_courses uc on uc.user_id = u.id
JOIN courses c on c.id = uc.course_id
WHERE u.id IN (
    SELECT
        DISTINCT user_id
    FROM (
        SELECT
            u.id as user_id,
            DENSE_RANK() OVER (
                ORDER BY
                    u.id ASC
            ) As RankNum
        FROM users u
        JOIN user_courses uc on uc.user_id = u.id
        JOIN courses c on c.id = uc.course_id
    ) AS ConstrainedResult
    WHERE
        RankNum > 0
        AND RankNum <= 2
)

This will return the following result:

Pagination with performant query result

Conclusion

Using LEFT JOIN in SQL means you may get duplicate instances of a column value across rows, which makes pagination difficult. But does it have to be?

By using the DENSE_RANK() window function, we can get the correct number of users on a page.

Do you know any alternative solutions to this problem? I'd love to hear them in the comments below.

I've placed all SQL queries in a GitHub Gist below. Feel free to paste the queries in SQLite Online to test them out.

Tags