'Percent change using Pandera for Pandas DataFrame
I have the following DataFrame. I need to do validation of balance and other numeric measures over date range. I want to check if for any group and date, the balance or other measures have changed by more than 25%. I can filter numerically using pct_change(). Is it possible to do this kind of validation in Pandera over date range?
DataFrame:
date id groupId bal num_meas1
0 2022-01-01 1 g1 10 1
1 2022-01-01 1 g1 11 1
2 2022-01-01 2 g2 12 2
3 2022-02-01 1 g1 13 3
4 2022-02-01 1 g1 14 3
5 2022-02-01 2 g2 15 4
6 2022-03-01 1 g1 16 5
7 2022-03-01 1 g1 17 5
8 2022-03-01 2 g2 20 6
Current Code:
d = {'date': ['2022-01-01', '2022-01-01', '2022-01-01', '2022-02-01', '2022-02-01', '2022-02-01', '2022-03-01', '2022-03-01', '2022-03-01'],
'id': [1, 1, 2, 1, 1, 2, 1, 1, 2],
'groupId': ['g1', 'g1', 'g2', 'g1', 'g1', 'g2', 'g1', 'g1', 'g2'],
'bal': [10, 11, 12, 13, 14, 15, 16, 17, 20],
'num_meas1': [1, 1, 2, 3, 3, 4, 5, 5, 6]
}
df = pd.DataFrame(d)
df = df.groupby(['date', 'groupId']).agg({'bal':[sum]}).reset_index()
df.columns = ['date', 'groupId', 'totalBal']
df.sort_values(['groupId','date'], inplace=True)
df['pct_change'] = df.groupby(['groupId'])['totalBal'].pct_change()
df[df['pct_change'] >= 0.25]
Output:
date groupId totalBal pct_change
2 2022-02-01 g1 27 0.285714
3 2022-02-01 g2 15 0.250000
5 2022-03-01 g2 20 0.333333
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|