'Adding correlation result back to pandas dataframe
I am wondering how to add the corr()
result back to a panda dataframe as the current output is a bit nested. I just want to have one column in the original dataframe to list the value. What's the best way to achieve this?
id date water fire
0 apple 2018-01-01 100 100
1 orange 2018-01-01 110 110
2 apple 2019-01-01 90 9
3 orange 2019-01-01 50 50
4 apple 2020-01-01 40 4
5 orange 2020-01-01 60 60
6 apple 2021-01-01 70 470
7 orange 2021-01-01 80 15
8 apple 2022-01-01 90 90
9 orange 2022-01-01 100 9100
data = pd.DataFrame({
'id': ['apple', 'orange','apple','orange','apple', 'orange', 'apple', 'orange', 'apple', 'orange'],
'date': [
datetime.datetime(2018, 1, 1),
datetime.datetime(2018, 1, 1),
datetime.datetime(2019, 1, 1),
datetime.datetime(2019, 1, 1),
datetime.datetime(2020, 1, 1),
datetime.datetime(2020, 1, 1),
datetime.datetime(2021, 1, 1),
datetime.datetime(2021, 1, 1),
datetime.datetime(2022, 1, 1),
datetime.datetime(2022, 1, 1)
],
'water': [100, 110, 90, 50, 40, 60, 70, 80, 90, 100],
'fire': [100, 110, 9, 50, 4, 60, 470, 15, 90, 9100]
}
)
data.groupby('id')[['water', 'fire']].apply(lambda x : x.rolling(3).corr())
water fire
id
apple 0 water NaN NaN
fire NaN NaN
2 water NaN NaN
fire NaN NaN
4 water 1.000000 0.663924
fire 0.663924 1.000000
6 water 1.000000 0.123983
fire 0.123983 1.000000
8 water 1.000000 0.285230
fire 0.285230 1.000000
orange 1 water NaN NaN
fire NaN NaN
3 water NaN NaN
fire NaN NaN
5 water 1.000000 1.000000
fire 1.000000 1.000000
7 water 1.000000 -0.854251
fire -0.854251 1.000000
9 water 1.000000 0.863867
fire 0.863867 1.000000
Solution 1:[1]
Here is one way to do it:
df = pd.concat(
[
data,
data.groupby("id")[["water", "fire"]]
.apply(lambda x: x.rolling(3).corr())
.reset_index()
.drop_duplicates(subset=["level_1"])
.set_index("level_1")["fire"]
.rename("corr")
],
axis=1,
)
print(df)
# Output
id date water fire corr
0 apple 2018-01-01 100 100 NaN
1 orange 2018-01-01 110 110 NaN
2 apple 2019-01-01 90 9 NaN
3 orange 2019-01-01 50 50 NaN
4 apple 2020-01-01 40 4 0.663924
5 orange 2020-01-01 60 60 1.000000
6 apple 2021-01-01 70 470 0.123983
7 orange 2021-01-01 80 15 -0.854251
8 apple 2022-01-01 90 90 0.285230
9 orange 2022-01-01 100 9100 0.863867
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | Laurent |