'Make calculations across multiple tables based on the table suffix in Bigquery

I have a database of daily tables (with prefixes formatted as yyyymmdd) with customers info, and I need to get a 90 day timeline of 90 day ARPUs (average revenue per user). i.e. one point on the graph = revenue / number of users for the previous 90 days.

I have tried the following:

    SELECT
    ninetyd_rev/ninetyd_userid as arpu
    From(
    SELECT
    SUM(revenue) OVER (ORDER BY activity_date ROWS BETWEEN 89 PRECEDING AND CURRENT ROW ) AS ninetyd_rev,
    COUNT(userid) OVER (ORDER BY activity_date ROWS BETWEEN 89 PRECEDING AND CURRENT ROW ) AS ninetyd_userid,
    FROM
      `table_*`
    WHERE
      _TABLE_SUFFIX BETWEEN FORMAT_DATE("%Y%m%d", DATE_ADD(CURRENT_DATE(), INTERVAL -180 DAY))
      AND FORMAT_DATE("%Y%m%d",CURRENT_DATE()))

(where activity_date corresponds to the table suffix)

but that results in the following error: Resources exceeded during query execution: The query could not be executed in the allotted memory. Peak usage: 132% of limit. Top memory consumer(s): sort operations used for analytic OVER() clauses: 98% other/unattributed: 2%

I was also thinking that there might be a way to generate and array of numbers that can be looped in the interval somehow. Something like:

  generate array 0 to 89 as **variable x**
SELECT
    SUM(revenue)/COUNT(userid) AS arpu
    FROM
      `table_*`
    WHERE
      _TABLE_SUFFIX BETWEEN FORMAT_DATE("%Y%m%d", DATE_ADD(CURRENT_DATE(), INTERVAL **-x** DAY))
      AND FORMAT_DATE("%Y%m%d",CURRENT_DATE())

Yet I have not been able to achieve anything useful and not sure if that approach is possible at all. Thank you for all the tips!

Edit. The expected output would be

day arpu
d1 arpu1 (d1 and previous 89 days rev/#users)
d2 arpu2 (d2 and previous 89 days rev/#users)
dn arpun (dn and previous 89 days rev/#users)

ideally the days (d1, d2, ...dn) would be in yyyymmdd format.code

UPD. Based on the suggestion from @dikesh, I ran the query below. Yet Bigquery never finishes the calculation and does not give an output

WITH 
daily_userids AS (
SELECT activity_date as activity_date, userid as users 
FROM `table_x`
WHERE _table_suffix BETWEEN FORMAT_DATE("%Y%m%d", DATE_ADD(CURRENT_DATE(), INTERVAL -30 DAY)) AND FORMAT_DATE("%Y%m%d",CURRENT_DATE()) 
GROUP BY 1,2),
        
daily_revenue AS (
SELECT activity_date, SUM(revenue) AS revenue 
FROM `table_x`
WHERE _table_suffix BETWEEN FORMAT_DATE("%Y%m%d", DATE_ADD(CURRENT_DATE(), INTERVAL -30 DAY)) AND FORMAT_DATE("%Y%m%d",CURRENT_DATE()) 
GROUP BY 1 ),
        
-- Find number of users in last 30 days from the date
table_users AS (
SELECT t1.activity_date, count (DISTINCT t2.users) AS users
FROM daily_userids AS t1 JOIN daily_userids AS t2
ON t1.activity_date BETWEEN t2.activity_date AND DATE_ADD(t2.activity_date, INTERVAL 29 DAY)
GROUP BY t1.activity_date),
    
-- Find total revenue in last 30 days from the date
table_revenue AS (
SELECT activity_date, SUM(revenue) OVER (ORDER BY activity_date DESC ROWS BETWEEN CURRENT ROW AND 29 FOLLOWING) AS revenue
FROM daily_revenue
)

-- FIND ARPU
SELECT activity_date, users, revenue, ROUND(revenue / users, 4) AS arpu
FROM table_users JOIN table_revenue USING (activity_date)
ORDER BY activity_date DESC


Solution 1:[1]

Here is the sample query to find ARPU of last two days from each date

WITH table1 AS (
  SELECT * FROM UNNEST([
    STRUCT(DATE '2022-05-09' AS day, 100 AS revenue, 'A' AS userid),
    STRUCT(DATE '2022-05-09' AS day, 110 AS revenue, 'B' AS userid),
    STRUCT(DATE '2022-05-09' AS day, 80 AS revenue, 'C' AS userid),
    STRUCT(DATE '2022-05-08' AS day, 100 AS revenue, 'B' AS userid),
    STRUCT(DATE '2022-05-08' AS day, 110 AS revenue, 'C' AS userid),
    STRUCT(DATE '2022-05-08' AS day, 80 AS revenue, 'D' AS userid),
    STRUCT(DATE '2022-05-07' AS day, 100 AS revenue, 'C' AS userid),
    STRUCT(DATE '2022-05-07' AS day, 110 AS revenue, 'D' AS userid),
    STRUCT(DATE '2022-05-07' AS day, 80 AS revenue, 'E' AS userid),
    STRUCT(DATE '2022-05-06' AS day, 100 AS revenue, 'E' AS userid),
    STRUCT(DATE '2022-05-06' AS day, 110 AS revenue, 'C' AS userid),
    STRUCT(DATE '2022-05-06' AS day, 80 AS revenue, 'A' AS userid),
    STRUCT(DATE '2022-05-05' AS day, 100 AS revenue, 'B' AS userid),
    STRUCT(DATE '2022-05-05' AS day, 110 AS revenue, 'D' AS userid),
    STRUCT(DATE '2022-05-05' AS day, 80 AS revenue, 'C' AS userid)
  ])
),
daily_userids AS (SELECT DISTINCT day, userid from table1),
daily_revenue AS (SELECT day, SUM(revenue) AS revenue FROM table1 GROUP BY day),

-- Find number of users in last 2 days from the date
table_users AS (
  SELECT t1.day, COUNT(DISTINCT t2.userid) AS users
  FROM daily_userids AS t1 JOIN daily_userids AS t2
  ON t1.day BETWEEN t2.day AND DATE_ADD(t2.day, INTERVAL 1 DAY)
  GROUP BY t1.day
),
-- Find total revenue in last 2 days from the date
table_revenue AS (
  SELECT day, SUM(revenue) OVER (ORDER BY day DESC ROWS BETWEEN CURRENT ROW AND 1 FOLLOWING) AS revenue
  FROM daily_revenue
)
-- FIND ARPU
SELECT day, users, revenue, ROUND(revenue / users, 2) AS arpu
FROM table_users JOIN table_revenue USING (day)
ORDER BY day DESC

Output:

day users revenue arpu
2022-05-09 4 580 145.0
2022-05-08 4 580 145.0
2022-05-07 4 580 145.0
2022-05-06 5 580 116.0
2022-05-05 3 290 96.67

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1