'create a new column that contains a list of values from another column subsequent rows

I have a table like below,

and want to create a new column that contains a list of values from another column subsequent rows like below,

for copy paste: timestamp ID Value

2021-12-03 04:03:45 ID1 O

2021-12-03 04:03:46 ID1 P

2021-12-03 04:03:47 ID1 Q

2021-12-03 04:03:48 ID1 R

2021-12-03 04:03:49 ID1 NULL

2021-12-03 04:03:50 ID1 S

2021-12-03 04:03:51 ID1 T

2021-12-04 11:09:03 ID2 A

2021-12-04 11:09:04 ID2 B

2021-12-04 11:09:05 ID2 C

sql snowflake-cloud-data-platform

Solution 1:^[1]

Using windowed functions and range JOIN:

WITH cte AS (
  SELECT tab.*, 
     COALESCE(FIRST_VALUE(CASE WHEN VALUE IS NULL THEN tmp END) IGNORE NULLS 
                OVER(PARTITION BY ID ORDER BY TMP 
                ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING)
             ,MAX(tmp) OVER(PARTITION BY ID)) AS next_tmp
  FROM tab
)
SELECT c1.tmp, c1.id, c1.value,
      LISTAGG(c2.value, ',') WITHIN GROUP(ORDER BY c2.tmp) AS list
FROM cte c1
LEFT JOIN cte c2
  ON c1.ID = c2.ID
 AND (c1.tmp < c2.tmp AND c2.tmp <= c1.next_tmp)
GROUP BY c1.tmp, c1.id, c1.value
ORDER BY c1.ID, c1.tmp;

db<>fiddle demo

Output:

How does it work:

The idea is to find first timestamp corresponding to NULL value per each ID:

SELECT tab.*, 
 COALESCE(FIRST_VALUE(CASE WHEN VALUE IS NULL THEN tmp END) IGNORE NULLS 
            OVER(PARTITION BY ID ORDER BY TMP 
            ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING)
         , MAX(tmp) OVER(PARTITION BY ID)) AS next_tmp
FROM tab;

Output:

Solution 2:^[2]

The rules are complex and it wasn't easy to code this — but a triple join, rules for nulls, and removing the first element can produce the results as desired:

with data as (
select (x[0]||' '||x[1])::timestamp ts, x[2]::string id, iff(x[3]='NULL', null, x[3])::string value
from (
select split(value, ' ') x
from table(split_to_table($$2021-12-03 04:03:45 ID1 O
2021-12-03 04:03:46 ID1 P
2021-12-03 04:03:47 ID1 Q
2021-12-03 04:03:48 ID1 R
2021-12-03 04:03:49 ID1 NULL
2021-12-03 04:03:50 ID1 S
2021-12-03 04:03:51 ID1 T
2021-12-04 11:09:03 ID2 A
2021-12-04 11:09:04 ID2 B
2021-12-04 11:09:05 ID2 C$$, '\n')) 
))


select ts, id, value, iff( -- return null for null values
    value is null
    , null
    , array_to_string(
        array_slice( -- remove first element
            array_agg(bvalue) within group (order by bts)
            , 1, 99999)
        , ',')
) list
from (
    select a.*, b.ts bts, b.value bvalue
        , coalesce( -- find max null after current value, or max value if none
        (
            select max(ts)
            from data
            where a.id=id
            and value is null
            and a.ts<ts
        ),
        (
            select max(ts)
            from data
            where a.id=id
        )) maxts
    from data a
    join data b
    on a.id=b.id
    and a.ts<=b.ts
    where maxts >= b.ts
)
group by id, ts, value
order by id, ts

Solution 3:^[3]

Data that is ordered by TMP is also ordered by ID logically.

So, you can

group rows first by ID;
in each group, create a new group when the previous VALUE is null;
in each subgroup, use the comma to join up VALUEs from the second to the last non-null VALUE to form a sequence and make it the value of the new column LIST.

A SQL set is unordered, which makes computing process very complicated.

You need to first create a marker column using the window function, perform a self-join by the marker column, and group rows and join up VALUE values to get the desired result.

A common alternative is to fetch the original data out of the database and process it in Python or SPL. SPL, the open-source Java package, is easy to be integrated into a Java program and generates much simpler code. It can get it done with only two lines of code:

	A
1	=ORACLE.query("SELECT * FROM TAB ORDER BY 1")
2	=A1.group@o(#2).conj(~.group@i(#3[-1]==null).run(tmp=~.(#3).select(~),~=~.derive(tmp.m(#+1:).concat@c():LIST))).conj()

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution	Source
Solution 1
Solution 2	Felipe Hoffa
Solution 3	eNca

'create a new column that contains a list of values from another column subsequent rows

Solution 1:[1]

Solution 2:[2]

Solution 3:[3]

Sources

Related Questions

Solution 1:^[1]

Solution 2:^[2]

Solution 3:^[3]