-
To retrieve only unique combinations of the selected attributes: You can omit rows, where all selected columns are equal with the
DISTINCT
statement. -
To retrieve the group wise maximum of certain columns: You can keep only one record for each group with the
DISTINCT ON
statement, to omit equal rows within each specified group.
Use case
You have a query where you want only one record for a set of specifically ordered attributes.
How to use?
Let's say we look at the example how to query only the latest post for each user:
SELECT DISTINCT ON (user_id) posts.*
FROM posts
ORDER BY user_id, published_at DESC;
In Ruby and ActiveRecord it is just:
Post.select('DISTINCT ON (user_id) *').order(:user_id, published_at: :desc)
How does it work?
Internally this will produce unique groups on user id and take only the latest published post for each group.
As pseudo sql this could also be written as:
SELECT *
FROM (
SELECT user_id, posts.*
FROM posts
ORDER BY user_id, published_at
) p
GROUP BY user_id
Be aware that even though you can find a lot of examples like the above on the web, it may or may not be valid sql Show archive.org snapshot ! The reason for this is, that many of the modern sql optimizers will not take the ordering of subqueries into account for the actual execution.
Limitations
Performance
If performance is critical because you are dealing with large records for each group, you have to consider something else like using group by
with multicolumn indices. This
stackoverflow answer
Show archive.org snapshot
directs also to some alternatives.
Ordering
When you have to order additionally by another column as the one you have set for DISTINCT ON
, you have to make sure that the arguments in ORDER BY
have to be the same as the first arguments in any order in DISTINCT ON
and add any further ordered attributes thereafter:
SELECT DISTINCT ON (user_id, id) posts.*
FROM posts
ORDER BY user_id, id, created_at DESC;
But if you want the complete result to be ordered by completely different column you have to wrap the distinct query as a subquery Show archive.org snapshot .
In SQL it would look like
SELECT *
FROM (
SELECT DISTINCT ON (user_id) posts.*
FROM posts
ORDER BY user_id, published_at DESC
) post
ORDER BY category DESC, created_at;
The easiest options to achieve this in Rails ActiveRecord is to use "WHERE ID IN" with
User.where(id: Post.distinct_on_user.pluck(:user_id)).order(...)
Further ressources
- There are some good examples on the MySql documentation to produce queries with the same result in the article 5.6.4 The Rows Holding the Group-wise Maximum of a Certain Column Show archive.org snapshot . Most of the examples should be portable to postgres in similar fashion.