'Rolling standard deviation excluding current rows in python

I'm trying calculate the rolling standard deviation for a groupby object which contains Invoice Date, Vendor Name and Agency Name, while excluding the current row in the calculation. So the rolling std() for each Vendor-Agency pair basically. It’s my first time asking a question here, so please let me know if additional information is required.

I already managed to do something similar with the rolling mean with the codes below.

last_3_invoices = dft_gb_comb.set_index("INVOICE_DATE").groupby(["VENDOR_NAME","OCP_AGNCY_NAME"])["daily_avg_amount"].rolling(3,min_periods = 1, closed="left").mean().fillna(0).reset_index()
last_30_df= dft_gb_comb.set_index("INVOICE_DATE").groupby(["VENDOR_NAME","OCP_AGNCY_NAME"])["daily_avg_amount"].rolling("30d",closed="left").mean().fillna(0).reset_index()

The problem is, if I try to replicate this code with .std() instead of .mean() I get no values. If i omit the "closed ="left"" statement I get values, but those include the current row, so that wouldn't work for me. I tried several ways to work with the .shift() statement instead

This could would work for some cases but it does not respect the ["VENDOR_NAME","OCP_AGNCY_NAME"] boundaries.

dft_gb_comb.set_index("INVOICE_DATE").groupby(["VENDOR_NAME","OCP_AGNCY_NAME"])["daily_avg_amount"].rolling(3,min_periods=1).std().shift().reset_index()

If I use shift() directly after the groupby like below it only returns the Invoice date with the values while dropping ["VENDOR_NAME","OCP_AGNCY_NAME"] from the groupby object...

dft_gb_comb.set_index("INVOICE_DATE").groupby(["VENDOR_NAME","OCP_AGNCY_NAME"])["daily_avg_amount"].shift().rolling(3,min_periods=1).std().reset_index()

dft_gb_comb.set_index("INVOICE_DATE").groupby(["VENDOR_NAME","OCP_AGNCY_NAME"])["daily_avg_amount"].apply(lambda x : x.shift(1).rolling(3,min_periods=1).std()).reset_index()
INVOICE_DATE daily_avg_amount
0 2018-04-19 NaN
1 2018-12-10 NaN
408184 2019-06-05 139.02567
408185 2019-06-19 124.92972
408186 2018-05-16 34.92400
408187 2019-04-19 135.41095
408188 2018-08-06 NaN

instead of something like this, which would be the desired end result.

VENDOR_NAME OCP_AGNCY_NAME INVOICE_DATE daily_avg_amount
0 AED PROFESSIONALS N. E. OKLA. A & M COLLEGE 2018-04-19 NaN
2 E3 DIAGNOSTIC,INC. DBA E3 GORDON STOWE/ OKLAHOMA STATE UNIVERSITY 2018-01-17 215.66757

I need to join the groupby object with the rolling values back to the main dataframe in the end, so I can't allow "VENDOR_NAME","OCP_AGNCY_NAME" to be dropped.

current python version would be

python : 3.8.8.final.0

pandas : 1.2.4

numpy : 1.20.1

I would be thankful for any kind of suggestions. I'm working with vendor data from here



Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source