'SQL group by: select value where another column has its min/max
I want to group by one column, get both min and max of a second column, and (this is the tricky part!) get the value from a third column where the second column has its min value in the group.
Example:
MyTable:
ID TS GRP
==================
1 20 A
2 20 B
3 10 A
4 30 A
5 10 B
6 40 A
Desired result (ID should be the value from the record where TS has its minimum):
ID MIN_TS MAX_TS GRP
============================
3 10 40 A
5 10 20 B
In general, the grouping query is very easy:
SELECT <???> AS ID, MIN(TS) AS MIN_TS, MAX(TS) AS MAX_TS, GRP
FROM MyTable
GROUP BY GRP
But what about the ID part? It doesn't work this way with grouping, right? But why? And what's the best workaround?
Solution 1:[1]
Do the aggregation in a subquery, then look up the ID for each group in another subquery:
SELECT
(SELECT TOP(1) id FROM MyTable WHERE grp = agg.grp ORDER BY ts DESC) AS id,
min_ts, max_ts, grp
FROM (SELECT min(ts) AS min_ts, max(ts) AS max_ts, grp
FROM MyTable
GROUP BY grp) agg
Or use window functions:
SELECT id, min_ts, max_ts, grp
FROM (SELECT
id,
min(ts) OVER (PARTITION BY grp) min_ts,
max(ts) OVER (PARTITION BY grp) max_ts,
grp,
row_number OVER (PARTITION BY grp ORDER BY ts) rn
FROM MyTable)
WHERE rn = 1;
This query uses window functions to calculate min_ts
and max_ts
for each group, and then filters to only include the first row for each group (ordered by ts
).
Solution 2:[2]
A bit late, but for future comers...
I can propose another solution that's a bit different than the existing one, it's basically the same idea, but it's implemented in another way (and it's maybe a bit faster?).
So you can basically make all the grouping and aggregation in a sub query (using WITH
), then use INNER JOIN
between that query and your original table to get what you want, it would be something like this...
WITH values AS (
SELECT
MIN(ts) as min_ts,
MAX(ts) AS max_ts,
grp
FROM MyTable
GROUP BY grp
)
SELECT
tb.id AS id,
v.min_ts AS min_ts,
v.max_ts AS max_ts,
tb.grp AS grp
FROM MyTable tb
INNER JOIN values v ON v.grp = tb.grp AND v.min_ts = tb.ts;
This is a bit simpler, and a bit more intuitive (at least to me).
I have tested this on Postgres DB, and for the following data
id | ts | grp
----+----+-----
7 | 5 | A
3 | 10 | A
1 | 20 | A
5 | 30 | A
4 | 10 | B
2 | 20 | B
6 | 30 | B
8 | 60 | B
It gives the following results
id | min_ts | max_ts | grp
----+--------+--------+-----
7 | 5 | 30 | A
4 | 10 | 60 | B
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | |
Solution 2 | Amr Saber |