'Make calculations across multiple tables based on the table suffix in Bigquery
I have a database of daily tables (with prefixes formatted as yyyymmdd) with customers info, and I need to get a 90 day timeline of 90 day ARPUs (average revenue per user). i.e. one point on the graph = revenue / number of users for the previous 90 days.
I have tried the following:
SELECT
ninetyd_rev/ninetyd_userid as arpu
From(
SELECT
SUM(revenue) OVER (ORDER BY activity_date ROWS BETWEEN 89 PRECEDING AND CURRENT ROW ) AS ninetyd_rev,
COUNT(userid) OVER (ORDER BY activity_date ROWS BETWEEN 89 PRECEDING AND CURRENT ROW ) AS ninetyd_userid,
FROM
`table_*`
WHERE
_TABLE_SUFFIX BETWEEN FORMAT_DATE("%Y%m%d", DATE_ADD(CURRENT_DATE(), INTERVAL -180 DAY))
AND FORMAT_DATE("%Y%m%d",CURRENT_DATE()))
(where activity_date corresponds to the table suffix)
but that results in the following error: Resources exceeded during query execution: The query could not be executed in the allotted memory. Peak usage: 132% of limit. Top memory consumer(s): sort operations used for analytic OVER() clauses: 98% other/unattributed: 2%
I was also thinking that there might be a way to generate and array of numbers that can be looped in the interval somehow. Something like:
generate array 0 to 89 as **variable x**
SELECT
SUM(revenue)/COUNT(userid) AS arpu
FROM
`table_*`
WHERE
_TABLE_SUFFIX BETWEEN FORMAT_DATE("%Y%m%d", DATE_ADD(CURRENT_DATE(), INTERVAL **-x** DAY))
AND FORMAT_DATE("%Y%m%d",CURRENT_DATE())
Yet I have not been able to achieve anything useful and not sure if that approach is possible at all. Thank you for all the tips!
Edit. The expected output would be
day | arpu |
---|---|
d1 | arpu1 (d1 and previous 89 days rev/#users) |
d2 | arpu2 (d2 and previous 89 days rev/#users) |
dn | arpun (dn and previous 89 days rev/#users) |
ideally the days (d1, d2, ...dn) would be in yyyymmdd format.code
UPD. Based on the suggestion from @dikesh, I ran the query below. Yet Bigquery never finishes the calculation and does not give an output
WITH
daily_userids AS (
SELECT activity_date as activity_date, userid as users
FROM `table_x`
WHERE _table_suffix BETWEEN FORMAT_DATE("%Y%m%d", DATE_ADD(CURRENT_DATE(), INTERVAL -30 DAY)) AND FORMAT_DATE("%Y%m%d",CURRENT_DATE())
GROUP BY 1,2),
daily_revenue AS (
SELECT activity_date, SUM(revenue) AS revenue
FROM `table_x`
WHERE _table_suffix BETWEEN FORMAT_DATE("%Y%m%d", DATE_ADD(CURRENT_DATE(), INTERVAL -30 DAY)) AND FORMAT_DATE("%Y%m%d",CURRENT_DATE())
GROUP BY 1 ),
-- Find number of users in last 30 days from the date
table_users AS (
SELECT t1.activity_date, count (DISTINCT t2.users) AS users
FROM daily_userids AS t1 JOIN daily_userids AS t2
ON t1.activity_date BETWEEN t2.activity_date AND DATE_ADD(t2.activity_date, INTERVAL 29 DAY)
GROUP BY t1.activity_date),
-- Find total revenue in last 30 days from the date
table_revenue AS (
SELECT activity_date, SUM(revenue) OVER (ORDER BY activity_date DESC ROWS BETWEEN CURRENT ROW AND 29 FOLLOWING) AS revenue
FROM daily_revenue
)
-- FIND ARPU
SELECT activity_date, users, revenue, ROUND(revenue / users, 4) AS arpu
FROM table_users JOIN table_revenue USING (activity_date)
ORDER BY activity_date DESC
Solution 1:[1]
Here is the sample query to find ARPU of last two days from each date
WITH table1 AS (
SELECT * FROM UNNEST([
STRUCT(DATE '2022-05-09' AS day, 100 AS revenue, 'A' AS userid),
STRUCT(DATE '2022-05-09' AS day, 110 AS revenue, 'B' AS userid),
STRUCT(DATE '2022-05-09' AS day, 80 AS revenue, 'C' AS userid),
STRUCT(DATE '2022-05-08' AS day, 100 AS revenue, 'B' AS userid),
STRUCT(DATE '2022-05-08' AS day, 110 AS revenue, 'C' AS userid),
STRUCT(DATE '2022-05-08' AS day, 80 AS revenue, 'D' AS userid),
STRUCT(DATE '2022-05-07' AS day, 100 AS revenue, 'C' AS userid),
STRUCT(DATE '2022-05-07' AS day, 110 AS revenue, 'D' AS userid),
STRUCT(DATE '2022-05-07' AS day, 80 AS revenue, 'E' AS userid),
STRUCT(DATE '2022-05-06' AS day, 100 AS revenue, 'E' AS userid),
STRUCT(DATE '2022-05-06' AS day, 110 AS revenue, 'C' AS userid),
STRUCT(DATE '2022-05-06' AS day, 80 AS revenue, 'A' AS userid),
STRUCT(DATE '2022-05-05' AS day, 100 AS revenue, 'B' AS userid),
STRUCT(DATE '2022-05-05' AS day, 110 AS revenue, 'D' AS userid),
STRUCT(DATE '2022-05-05' AS day, 80 AS revenue, 'C' AS userid)
])
),
daily_userids AS (SELECT DISTINCT day, userid from table1),
daily_revenue AS (SELECT day, SUM(revenue) AS revenue FROM table1 GROUP BY day),
-- Find number of users in last 2 days from the date
table_users AS (
SELECT t1.day, COUNT(DISTINCT t2.userid) AS users
FROM daily_userids AS t1 JOIN daily_userids AS t2
ON t1.day BETWEEN t2.day AND DATE_ADD(t2.day, INTERVAL 1 DAY)
GROUP BY t1.day
),
-- Find total revenue in last 2 days from the date
table_revenue AS (
SELECT day, SUM(revenue) OVER (ORDER BY day DESC ROWS BETWEEN CURRENT ROW AND 1 FOLLOWING) AS revenue
FROM daily_revenue
)
-- FIND ARPU
SELECT day, users, revenue, ROUND(revenue / users, 2) AS arpu
FROM table_users JOIN table_revenue USING (day)
ORDER BY day DESC
Output:
day | users | revenue | arpu |
---|---|---|---|
2022-05-09 | 4 | 580 | 145.0 |
2022-05-08 | 4 | 580 | 145.0 |
2022-05-07 | 4 | 580 | 145.0 |
2022-05-06 | 5 | 580 | 116.0 |
2022-05-05 | 3 | 290 | 96.67 |
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 |