'How to group data with SQL
How do I group my Timestream data?
The table looks simplified like this:
point_delivery_number | measure_name | time | value
------------------------------------------------------------------------
AT3265345345 | "consumption" | 2021-01-02 12:00:00.00 | 0.13
AT3265345345 | "generation" | 2021-01-02 12:00:00.00 | 0.32
I want to query where point_delivery_number
== xx and time
= xx
Result should be:
point_delivery_number | consumption | time | generation
----------------------------------------------------------
AT3265345345 | 0.13 | xxxxx | 0.32
What I tried is:
SELECT point_delivery_number, measure_name, time, measure_value::double
FROM "energy_datapoints"."energy_data"
WHERE point_delivery_number='AT234123234541243'
GROUP BY point_delivery_number, measure_name, time, measure_value::double;
Result is:
point_delivery_number | measure_name | time | value
------------------------------------------------------------------------
AT3265345345 | "generation" | 2021-01-02 12:15:00.00 | 0.123
AT3265345345 | "generation" | 2021-01-02 12:00:00.00 | 0.32
I want consumption
and generation
to be a property rather than a value.
Solution 1:[1]
You are dealing with a key/value table. Per point_delivery_number it has rows with a key (measure_name) and values (time and value).
You want to get the values for two keys. One way is to select both and join them:
select
point_delivery_number,
c.value as consumption,
g.value as generation
from
(select * from energy_datapoints.energy_data where measure_name = 'consumption') c
full outer join
(select * from energy_datapoints.energy_data where measure_name = 'generation') g
using (point_delivery_number)
order by point_delivery_number;
Another way is aggregation. You want one row per point_delivery_number, so you GROUP BY point_delivery_number
. Then use MIN
or MAX
on a condition to only get the measure names in question.
select
point_delivery_number,
min(case when measure_name = 'consumption' then value end) as consumption,
min(case when measure_name = 'generation' then value end) as generation
from energy_datapoints.energy_data
group by point_delivery_number
order by point_delivery_number;
Disclaimer: I don't know Amazon Timestream. The above queries are standard SQL queries that should work (exactly as written or with slight changes) in most RDBMS.
As to your own query: You make it look like you are aggregating, but it seems you are simply selecting single rows, because your GROUP BY
clause includes all columns. GROUP BY ____
means "I want to aggregate my data to get one result row per ____". You want one result row per point_delivery_number, so GROUP BY point_delivery_number
.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | Thorsten Kettner |