'create a new column that contains a list of values from another column subsequent rows
I have a table like below,
and want to create a new column that contains a list of values from another column subsequent rows like below,
for copy paste: timestamp ID Value
2021-12-03 04:03:45 ID1 O
2021-12-03 04:03:46 ID1 P
2021-12-03 04:03:47 ID1 Q
2021-12-03 04:03:48 ID1 R
2021-12-03 04:03:49 ID1 NULL
2021-12-03 04:03:50 ID1 S
2021-12-03 04:03:51 ID1 T
2021-12-04 11:09:03 ID2 A
2021-12-04 11:09:04 ID2 B
2021-12-04 11:09:05 ID2 C
Solution 1:[1]
Using windowed functions and range JOIN:
WITH cte AS (
SELECT tab.*,
COALESCE(FIRST_VALUE(CASE WHEN VALUE IS NULL THEN tmp END) IGNORE NULLS
OVER(PARTITION BY ID ORDER BY TMP
ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING)
,MAX(tmp) OVER(PARTITION BY ID)) AS next_tmp
FROM tab
)
SELECT c1.tmp, c1.id, c1.value,
LISTAGG(c2.value, ',') WITHIN GROUP(ORDER BY c2.tmp) AS list
FROM cte c1
LEFT JOIN cte c2
ON c1.ID = c2.ID
AND (c1.tmp < c2.tmp AND c2.tmp <= c1.next_tmp)
GROUP BY c1.tmp, c1.id, c1.value
ORDER BY c1.ID, c1.tmp;
Output:
How does it work:
The idea is to find first timestamp corresponding to NULL value per each ID:
SELECT tab.*,
COALESCE(FIRST_VALUE(CASE WHEN VALUE IS NULL THEN tmp END) IGNORE NULLS
OVER(PARTITION BY ID ORDER BY TMP
ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING)
, MAX(tmp) OVER(PARTITION BY ID)) AS next_tmp
FROM tab;
Output:
Solution 2:[2]
The rules are complex and it wasn't easy to code this — but a triple join, rules for nulls, and removing the first element can produce the results as desired:
with data as (
select (x[0]||' '||x[1])::timestamp ts, x[2]::string id, iff(x[3]='NULL', null, x[3])::string value
from (
select split(value, ' ') x
from table(split_to_table($$2021-12-03 04:03:45 ID1 O
2021-12-03 04:03:46 ID1 P
2021-12-03 04:03:47 ID1 Q
2021-12-03 04:03:48 ID1 R
2021-12-03 04:03:49 ID1 NULL
2021-12-03 04:03:50 ID1 S
2021-12-03 04:03:51 ID1 T
2021-12-04 11:09:03 ID2 A
2021-12-04 11:09:04 ID2 B
2021-12-04 11:09:05 ID2 C$$, '\n'))
))
select ts, id, value, iff( -- return null for null values
value is null
, null
, array_to_string(
array_slice( -- remove first element
array_agg(bvalue) within group (order by bts)
, 1, 99999)
, ',')
) list
from (
select a.*, b.ts bts, b.value bvalue
, coalesce( -- find max null after current value, or max value if none
(
select max(ts)
from data
where a.id=id
and value is null
and a.ts<ts
),
(
select max(ts)
from data
where a.id=id
)) maxts
from data a
join data b
on a.id=b.id
and a.ts<=b.ts
where maxts >= b.ts
)
group by id, ts, value
order by id, ts
Solution 3:[3]
Data that is ordered by TMP
is also ordered by ID
logically.
So, you can
- group rows first by ID;
- in each group, create a new group when the previous VALUE is null;
- in each subgroup, use the comma to join up VALUEs from the second to the last non-null VALUE to form a sequence and make it the value of the new column LIST.
A SQL set is unordered, which makes computing process very complicated.
You need to first create a marker column using the window function, perform a self-join by the marker column, and group rows and join up VALUE values to get the desired result.
A common alternative is to fetch the original data out of the database and process it in Python or SPL. SPL, the open-source Java package, is easy to be integrated into a Java program and generates much simpler code. It can get it done with only two lines of code:
A | |
---|---|
1 | =ORACLE.query("SELECT * FROM TAB ORDER BY 1") |
2 | =A1.group@o(#2).conj(~.group@i(#3[-1]==null).run(tmp=~.(#3).select(~),~=~.derive(tmp.m(#+1:).concat@c():LIST))).conj() |
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | |
Solution 2 | Felipe Hoffa |
Solution 3 | eNca |