'How to use window functions to get the cumulative count of two columns based on the value from one column?
I'm trying to get the cumulative count of previous records for two player columns where the player ID exists in in one of the previous columns. As an example I'm trying to output the table below where the last two columns are calculated from the first three columns:
+-----+-------+-------+------------------------+------------------------+
| id_ | p1_id | p2_id | p1_id_cumulative_count | p2_id_cumulative_count |
+-----+-------+-------+------------------------+------------------------+
| 1 | 1 | 2 | 0 | 0 |
| 2 | 2 | 1 | 1 | 1 |
| 3 | 1 | 3 | 2 | 0 |
| 4 | 2 | 1 | 2 | 3 |
+-----+-------+-------+------------------------+------------------------+
I'm trying to move away from subqueries and I've ben exploring window functions. However, in this case a simple COUNT
partitioned over p1_id
would only count records where the player was in the same column:
SELECT
id_,
p1_id,
p2_id,
COUNT(p1_id) OVER (PARTITION BY p1_id ORDER BY id_ ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING) AS p1_id_cumulative_count,
COUNT(p2_id) OVER (PARTITION BY p2_id ORDER BY id_ ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING) AS p2_id_cumulative_count
FROM
test.example_table
ORDER BY id_;
Results in:
+-----+-------+-------+------------------------+------------------------+
| id_ | p1_id | p2_id | p1_id_cumulative_count | p2_id_cumulative_count |
+-----+-------+-------+------------------------+------------------------+
| 1 | 1 | 2 | 0 | 0 |
| 2 | 2 | 1 | 0 | 0 |
| 3 | 1 | 3 | 1 | 0 |
| 4 | 2 | 1 | 1 | 1 |
+-----+-------+-------+------------------------+------------------------+
I then thought I was being clever by combining SUM
and CASE
to solve this issue:
SELECT
id_,
p1_id,
p2_id,
SUM(CASE WHEN (p1_id = p1_id OR p1_id = p2_id) THEN 1 ELSE 0 END) OVER (PARTITION BY p1_id ORDER BY id_ ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING) AS p1_id_cumulative_count,
SUM(CASE WHEN (p2_id = p1_id OR p2_id = p2_id) THEN 1 ELSE 0 END) OVER (PARTITION BY p2_id ORDER BY id_ ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING) AS p2_id_cumulative_count
FROM
test.example_table
ORDER BY id_;
Alas to no avail:
+-----+-------+-------+------------------------+------------------------+
| id_ | p1_id | p2_id | p1_id_cumulative_count | p2_id_cumulative_count |
+-----+-------+-------+------------------------+------------------------+
| 1 | 1 | 2 | NULL | NULL |
| 2 | 2 | 1 | NULL | NULL |
| 3 | 1 | 3 | 1 | NULL |
| 4 | 2 | 1 | 1 | 1 |
+-----+-------+-------+------------------------+------------------------+
Anyone able to put me out of my misery?
Here's the SQL to create and populate the table:
CREATE TABLE `example_table` (
`id_` int NOT NULL AUTO_INCREMENT,
`p1_id` int DEFAULT NULL,
`p2_id` int DEFAULT NULL,
PRIMARY KEY (`id_`)
) ENGINE=InnoDB AUTO_INCREMENT=1 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci;
INSERT INTO `example_table` VALUES (1,1,2),(2,2,1),(3,1,3),(4,2,1);
Solution 1:[1]
I present to you 2 options, one I don't think will work because mysql doesn't always support PIVOT
but I wrote the 2nd option using something more generic.
They use CTE's which are technically subqueries, but maybe you will like them better because they're easier to decipher.
Option 1:
WITH CTE_LONGFORM
AS (
SELECT id_ AS i
,p1_id AS pid
,'p1' AS col
FROM `example_table`
UNION ALL
SELECT id_ AS i
,p2_id AS pid
,'p2' AS col
FROM `example_table`
)
,CTE_CUMSUM
AS (
SELECT *
,COUNT(PID) OVER (
PARTITION BY PID ORDER BY I ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING
) AS COUNT_PID
FROM CTE_LONGFORM
)
SELECT I
,P1
,P2
FROM (
SELECT I
,COUNT_PID
,COL
FROM CTE_CUMSUM
)
PIVOT(SUM(COUNT_PID) FOR COL IN (
'p1'
,'p2'
)) AS p(I, P1, P2)
Option 2:
WITH CTE_LONGFORM
AS (
SELECT id_ AS i
,p1_id AS pid
,'p1' AS col
FROM `example_table`
UNION ALL
SELECT id_ AS i
,p2_id AS pid
,'p2' AS col
FROM `example_table`
)
,CTE_CUMSUM
AS (
SELECT *
,COUNT(PID) OVER (
PARTITION BY PID ORDER BY I ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING
) AS COUNT_PID
FROM CTE_LONGFORM
)
SELECT I AS id_
,MAX(CASE WHEN COL='p1' THEN pid ELSE NULL END) AS p1_id
,MAX(CASE WHEN COL='p2' THEN pid ELSE NULL END) AS p2_id
,SUM(CASE WHEN COL='p1' THEN COUNT_PID ELSE 0 END) AS p1_id_cumulative_count
,SUM(CASE WHEN COL='p2' THEN COUNT_PID ELSE 0 END) AS p2_id_cumulative_count
FROM CTE_CUMSUM
GROUP BY 1
ORDER BY 1
I used Rasgo to translate the first one, and it suggested PIVOT since I'm currently using Snowflake. If I had a mysql instance it would probably suggest the 2nd.
Solution 2:[2]
You cannot use window functions directly since partition by cannot create desired window frames.
One solution is to move desired values into one column (1 row x 2 column becomes 2 row x 1 column). Then use window functions to count the values. Finally use conditional aggregation or some other trick to convert 2 row x 1 column back to 1 row x 2 column. Like so:
with cte1 as (
select t.id_, x.pid, x.col
from t, lateral (
select p1_id, 1 union all
select p2_id, 2
) as x(pid, col)
), cte2 as (
select *, count(*) over (
partition by pid
order by id_ rows between unbounded preceding and 1 preceding
) as rcount
from cte1
)
select id_
, min(case when col = 1 then pid end) as p1_id
, min(case when col = 2 then pid end) as p2_id
, min(case when col = 1 then rcount end) as p1_rcount
, min(case when col = 2 then rcount end) as p2_rcount
from cte2
group by id_
Solution 3:[3]
Actually if my understanding of the problem was correct, you shouldn't need window functions nor subqueries.
You can have cumulative sum by:
- applying a
LEFT SELF JOIN
on theexample_table
with a>
condition so that you'll match each row with the preceeding ones - applying a
CASE
to get when each player can be found in any of the two columns - applying the
SUM
aggregation function
Here it is:
SELECT t1.id_,
SUM(CASE WHEN t1.p1_id IN (t2.p1_id, t2.p2_id) THEN 1 ELSE 0 END) AS p1_cumsum,
SUM(CASE WHEN t1.p2_id IN (t2.p1_id, t2.p2_id) THEN 1 ELSE 0 END) AS p2_cumsum
FROM example_table t1
LEFT JOIN example_table t2
ON t1.id_ > t2.id_
GROUP BY t1.id_
ORDER BY t1.id_;
Find the corresponding SQL Fiddle here.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | Josh |
Solution 2 | |
Solution 3 | lemon |