'How to use window functions to get the cumulative count of two columns based on the value from one column?

I'm trying to get the cumulative count of previous records for two player columns where the player ID exists in in one of the previous columns. As an example I'm trying to output the table below where the last two columns are calculated from the first three columns:

+-----+-------+-------+------------------------+------------------------+
| id_ | p1_id | p2_id | p1_id_cumulative_count | p2_id_cumulative_count |
+-----+-------+-------+------------------------+------------------------+
|   1 |     1 |     2 |                      0 |                      0 |
|   2 |     2 |     1 |                      1 |                      1 |
|   3 |     1 |     3 |                      2 |                      0 |
|   4 |     2 |     1 |                      2 |                      3 |
+-----+-------+-------+------------------------+------------------------+

I'm trying to move away from subqueries and I've ben exploring window functions. However, in this case a simple COUNT partitioned over p1_id would only count records where the player was in the same column:

SELECT 
    id_,
    p1_id,
    p2_id,
    COUNT(p1_id) OVER (PARTITION BY p1_id ORDER BY id_ ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING) AS p1_id_cumulative_count,
    COUNT(p2_id) OVER (PARTITION BY p2_id ORDER BY id_ ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING) AS p2_id_cumulative_count
FROM
    test.example_table
ORDER BY id_;

Results in:

+-----+-------+-------+------------------------+------------------------+
| id_ | p1_id | p2_id | p1_id_cumulative_count | p2_id_cumulative_count |
+-----+-------+-------+------------------------+------------------------+
|   1 |     1 |     2 |                      0 |                      0 |
|   2 |     2 |     1 |                      0 |                      0 |
|   3 |     1 |     3 |                      1 |                      0 |
|   4 |     2 |     1 |                      1 |                      1 |
+-----+-------+-------+------------------------+------------------------+

I then thought I was being clever by combining SUM and CASE to solve this issue:

SELECT 
    id_,
    p1_id,
    p2_id,
    SUM(CASE WHEN (p1_id = p1_id OR p1_id = p2_id) THEN 1 ELSE 0 END) OVER (PARTITION BY p1_id ORDER BY id_ ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING) AS p1_id_cumulative_count,
    SUM(CASE WHEN (p2_id = p1_id OR p2_id = p2_id) THEN 1 ELSE 0 END) OVER (PARTITION BY p2_id ORDER BY id_ ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING) AS p2_id_cumulative_count
FROM
    test.example_table
ORDER BY id_;

Alas to no avail:

+-----+-------+-------+------------------------+------------------------+
| id_ | p1_id | p2_id | p1_id_cumulative_count | p2_id_cumulative_count |
+-----+-------+-------+------------------------+------------------------+
|   1 |     1 |     2 | NULL                   | NULL                   |
|   2 |     2 |     1 | NULL                   | NULL                   |
|   3 |     1 |     3 | 1                      | NULL                   |
|   4 |     2 |     1 | 1                      | 1                      |
+-----+-------+-------+------------------------+------------------------+

Anyone able to put me out of my misery?

Here's the SQL to create and populate the table:

CREATE TABLE `example_table` (
  `id_` int NOT NULL AUTO_INCREMENT,
  `p1_id` int DEFAULT NULL,
  `p2_id` int DEFAULT NULL,
  PRIMARY KEY (`id_`)
) ENGINE=InnoDB AUTO_INCREMENT=1 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci;
INSERT INTO `example_table` VALUES (1,1,2),(2,2,1),(3,1,3),(4,2,1);


Solution 1:[1]

I present to you 2 options, one I don't think will work because mysql doesn't always support PIVOT but I wrote the 2nd option using something more generic.

They use CTE's which are technically subqueries, but maybe you will like them better because they're easier to decipher.

Option 1:

WITH CTE_LONGFORM
AS (
    SELECT id_ AS i
        ,p1_id AS pid
        ,'p1' AS col
    FROM `example_table`
    
    UNION ALL
    
    SELECT id_ AS i
        ,p2_id AS pid
        ,'p2' AS col
    FROM `example_table`
    )
    ,CTE_CUMSUM
AS (
    SELECT *
        ,COUNT(PID) OVER (
            PARTITION BY PID ORDER BY I ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING
            ) AS COUNT_PID
    FROM CTE_LONGFORM
    )
SELECT I
    ,P1
    ,P2
FROM (
    SELECT I
        ,COUNT_PID
        ,COL
    FROM CTE_CUMSUM
    )
PIVOT(SUM(COUNT_PID) FOR COL IN (
            'p1'
            ,'p2'
            )) AS p(I, P1, P2)

Option 2:

WITH CTE_LONGFORM
AS (
    SELECT id_ AS i
        ,p1_id AS pid
        ,'p1' AS col
    FROM `example_table`
    
    UNION ALL
    
    SELECT id_ AS i
        ,p2_id AS pid
        ,'p2' AS col
    FROM `example_table`
    )
    ,CTE_CUMSUM
AS (
    SELECT *
        ,COUNT(PID) OVER (
            PARTITION BY PID ORDER BY I ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING
            ) AS COUNT_PID
    FROM CTE_LONGFORM
    )
SELECT I AS id_
    ,MAX(CASE WHEN COL='p1' THEN pid ELSE NULL END) AS p1_id
    ,MAX(CASE WHEN COL='p2' THEN pid ELSE NULL END) AS p2_id
    ,SUM(CASE WHEN COL='p1' THEN COUNT_PID ELSE 0 END) AS p1_id_cumulative_count
    ,SUM(CASE WHEN COL='p2' THEN COUNT_PID ELSE 0 END) AS p2_id_cumulative_count
FROM CTE_CUMSUM 
GROUP BY 1 
ORDER BY 1

I used Rasgo to translate the first one, and it suggested PIVOT since I'm currently using Snowflake. If I had a mysql instance it would probably suggest the 2nd.

Solution 2:[2]

You cannot use window functions directly since partition by cannot create desired window frames.

One solution is to move desired values into one column (1 row x 2 column becomes 2 row x 1 column). Then use window functions to count the values. Finally use conditional aggregation or some other trick to convert 2 row x 1 column back to 1 row x 2 column. Like so:

with cte1 as (
    select t.id_, x.pid, x.col
    from t, lateral (
        select p1_id, 1 union all
        select p2_id, 2
    ) as x(pid, col)
), cte2 as (
    select *, count(*) over (
        partition by pid
        order by id_ rows between unbounded preceding and 1 preceding
    ) as rcount
    from cte1
)
select id_
     , min(case when col = 1 then pid end) as p1_id
     , min(case when col = 2 then pid end) as p2_id
     , min(case when col = 1 then rcount end) as p1_rcount
     , min(case when col = 2 then rcount end) as p2_rcount
from cte2
group by id_

Demo on db<>fiddle

Solution 3:[3]

Actually if my understanding of the problem was correct, you shouldn't need window functions nor subqueries.

You can have cumulative sum by:

  • applying a LEFT SELF JOIN on the example_table with a > condition so that you'll match each row with the preceeding ones
  • applying a CASE to get when each player can be found in any of the two columns
  • applying the SUM aggregation function

Here it is:

SELECT t1.id_,
       SUM(CASE WHEN t1.p1_id IN (t2.p1_id, t2.p2_id) THEN 1 ELSE 0 END) AS p1_cumsum,
       SUM(CASE WHEN t1.p2_id IN (t2.p1_id, t2.p2_id) THEN 1 ELSE 0 END) AS p2_cumsum
FROM      example_table t1
LEFT JOIN example_table t2
       ON t1.id_ > t2.id_
GROUP BY t1.id_
ORDER BY t1.id_;    

Find the corresponding SQL Fiddle here.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Josh
Solution 2
Solution 3 lemon