'Rolling standard deviation excluding current rows in python

I'm trying calculate the rolling standard deviation for a groupby object which contains Invoice Date, Vendor Name and Agency Name, while excluding the current row in the calculation. So the rolling std() for each Vendor-Agency pair basically. It’s my first time asking a question here, so please let me know if additional information is required.

I already managed to do something similar with the rolling mean with the codes below.

last_3_invoices = dft_gb_comb.set_index("INVOICE_DATE").groupby(["VENDOR_NAME","OCP_AGNCY_NAME"])["daily_avg_amount"].rolling(3,min_periods = 1, closed="left").mean().fillna(0).reset_index()
last_30_df= dft_gb_comb.set_index("INVOICE_DATE").groupby(["VENDOR_NAME","OCP_AGNCY_NAME"])["daily_avg_amount"].rolling("30d",closed="left").mean().fillna(0).reset_index()

The problem is, if I try to replicate this code with .std() instead of .mean() I get no values. If i omit the "closed ="left"" statement I get values, but those include the current row, so that wouldn't work for me. I tried several ways to work with the .shift() statement instead

This could would work for some cases but it does not respect the ["VENDOR_NAME","OCP_AGNCY_NAME"] boundaries.

dft_gb_comb.set_index("INVOICE_DATE").groupby(["VENDOR_NAME","OCP_AGNCY_NAME"])["daily_avg_amount"].rolling(3,min_periods=1).std().shift().reset_index()

If I use shift() directly after the groupby like below it only returns the Invoice date with the values while dropping ["VENDOR_NAME","OCP_AGNCY_NAME"] from the groupby object...

dft_gb_comb.set_index("INVOICE_DATE").groupby(["VENDOR_NAME","OCP_AGNCY_NAME"])["daily_avg_amount"].shift().rolling(3,min_periods=1).std().reset_index()

dft_gb_comb.set_index("INVOICE_DATE").groupby(["VENDOR_NAME","OCP_AGNCY_NAME"])["daily_avg_amount"].apply(lambda x : x.shift(1).rolling(3,min_periods=1).std()).reset_index()

	INVOICE_DATE	daily_avg_amount
0	2018-04-19	NaN
1	2018-12-10	NaN
408184	2019-06-05	139.02567
408185	2019-06-19	124.92972
408186	2018-05-16	34.92400
408187	2019-04-19	135.41095
408188	2018-08-06	NaN

instead of something like this, which would be the desired end result.

	VENDOR_NAME	OCP_AGNCY_NAME	INVOICE_DATE	daily_avg_amount
0	AED PROFESSIONALS N. E.	OKLA. A & M COLLEGE	2018-04-19	NaN
2	E3 DIAGNOSTIC,INC. DBA E3 GORDON STOWE/	OKLAHOMA STATE UNIVERSITY	2018-01-17	215.66757

I need to join the groupby object with the rolling values back to the main dataframe in the end, so I can't allow "VENDOR_NAME","OCP_AGNCY_NAME" to be dropped.

current python version would be

python : 3.8.8.final.0

pandas : 1.2.4

numpy : 1.20.1

I would be thankful for any kind of suggestions. I'm working with vendor data from here

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution	Source

'Rolling standard deviation excluding current rows in python

Sources

Related Questions