'Pandas rolling window cumsum, with incomplete series
I have a pandas df as follows:
YEAR MONTH USERID TRX_COUNT
2020 1 1 1
2020 2 1 2
2020 3 1 1
2020 12 1 1
2021 1 1 3
2021 2 1 3
2021 3 1 4
I want to sum
the TRX_COUNT
such that, each TRX_COUNT
is the sum
of TRX_COUNTS
of the next 12 months.
So my end result would look like
YEAR MONTH USERID TRX_COUNT TRX_COUNT_SUM
2020 1 1 1 5
2020 2 1 2 7
2020 3 1 1 8
2020 12 1 1 11
2021 1 1 3 10
2021 2 1 3 7
2021 3 1 4 4
For example TRX_COUNT_SUM
for 2020/1
is 1+2+1+1=5
the count of the first 12 months.
Second entry is 7 as it is the sum of 2+1+1+3
which is 12 months from 2020/2
I do expect partials, where there is not a full year of data. These can be either summed up partially or set to zero (as I wont be using partials).
I tried various variations of cumsum
and grouping by USERID, YR, MONTH
but am running into errors with handling the time window.
Thanks!
Solution 1:[1]
Looking at the logic and expected output, you are looking for more of rolling sum than cumsum. You want to roll 12 months and sum the number of TRX_COUNT
. cumsum
would cumulatively adding up the previous calculations.
Anyway, a few things that is complicated in your dataset. 1. interval is uneven 2. You are looking for forward rolling and typical rolling is backward.
To solve this, first, I would make the interval even so that I can use regular rolling.
df['ym'] = pd.to_datetime([f'{x}/0{y}' if y < 10 else f'{x}/{y}' for x, y in zip(df.YEAR, df.MONTH)])
df = df.set_index('ym').resample('MS').first()
Then, try forward rolling. To do the forward rolling, I reverse the dataframe once and do rolling then reverse back.
df['TRX_COUNT_SUM'] = (df.iloc[::-1] # Reverse to do (backward) rolling
.rolling(12, min_periods=0)
.TRX_COUNT.sum()
.iloc[::-1]) # Reverse back to original
# remove resampled records
df = df[df.YEAR > 0]
Result.
YEAR MONTH USERID TRX_COUNT TRX_COUNT_SUM
ym
2020-01-01 2020.0 1.0 1.0 1.0 5.0
2020-02-01 2020.0 2.0 1.0 2.0 7.0
2020-03-01 2020.0 3.0 1.0 1.0 8.0
2020-12-01 2020.0 12.0 1.0 1.0 11.0
2021-01-01 2021.0 1.0 1.0 3.0 10.0
2021-02-01 2021.0 2.0 1.0 3.0 7.0
2021-03-01 2021.0 3.0 1.0 4.0 4.0
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 |