'How to group data with SQL

How do I group my Timestream data?

The table looks simplified like this:

point_delivery_number | measure_name  | time                   | value
------------------------------------------------------------------------
AT3265345345          | "consumption" | 2021-01-02 12:00:00.00 | 0.13
AT3265345345          | "generation"  | 2021-01-02 12:00:00.00 | 0.32

I want to query where point_delivery_number == xx and time = xx

Result should be:

point_delivery_number | consumption | time  | generation
----------------------------------------------------------
AT3265345345          |  0.13       | xxxxx | 0.32

What I tried is:

    SELECT point_delivery_number, measure_name, time, measure_value::double 
    FROM "energy_datapoints"."energy_data"
    WHERE point_delivery_number='AT234123234541243'
    GROUP BY point_delivery_number, measure_name, time, measure_value::double;

Result is:

point_delivery_number | measure_name  | time                   | value
------------------------------------------------------------------------
AT3265345345          | "generation" | 2021-01-02 12:15:00.00  | 0.123
AT3265345345          | "generation"  | 2021-01-02 12:00:00.00 | 0.32

I want consumption and generation to be a property rather than a value.



Solution 1:[1]

You are dealing with a key/value table. Per point_delivery_number it has rows with a key (measure_name) and values (time and value).

You want to get the values for two keys. One way is to select both and join them:

select
  point_delivery_number,
  c.value as consumption,
  g.value as generation
from 
  (select * from energy_datapoints.energy_data where measure_name = 'consumption') c
full outer join
  (select * from energy_datapoints.energy_data where measure_name = 'generation') g
    using (point_delivery_number)
order by point_delivery_number;

Another way is aggregation. You want one row per point_delivery_number, so you GROUP BY point_delivery_number. Then use MIN or MAX on a condition to only get the measure names in question.

select
  point_delivery_number,
  min(case when measure_name = 'consumption' then value end) as consumption,
  min(case when measure_name = 'generation' then value end) as generation
from energy_datapoints.energy_data
group by point_delivery_number
order by point_delivery_number;

Disclaimer: I don't know Amazon Timestream. The above queries are standard SQL queries that should work (exactly as written or with slight changes) in most RDBMS.

As to your own query: You make it look like you are aggregating, but it seems you are simply selecting single rows, because your GROUP BY clause includes all columns. GROUP BY ____ means "I want to aggregate my data to get one result row per ____". You want one result row per point_delivery_number, so GROUP BY point_delivery_number.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Thorsten Kettner