'Adding correlation result back to pandas dataframe

I am wondering how to add the corr() result back to a panda dataframe as the current output is a bit nested. I just want to have one column in the original dataframe to list the value. What's the best way to achieve this?

       id       date  water  fire
0   apple 2018-01-01    100   100
1  orange 2018-01-01    110   110
2   apple 2019-01-01     90     9
3  orange 2019-01-01     50    50
4   apple 2020-01-01     40     4
5  orange 2020-01-01     60    60
6   apple 2021-01-01     70   470
7  orange 2021-01-01     80    15
8   apple 2022-01-01     90    90
9  orange 2022-01-01    100  9100

data = pd.DataFrame({
    'id': ['apple', 'orange','apple','orange','apple', 'orange', 'apple', 'orange', 'apple', 'orange'],
    'date': [
        datetime.datetime(2018, 1, 1),
        datetime.datetime(2018, 1, 1),
        datetime.datetime(2019, 1, 1),
        datetime.datetime(2019, 1, 1),
        datetime.datetime(2020, 1, 1),
        datetime.datetime(2020, 1, 1),
        datetime.datetime(2021, 1, 1),
        datetime.datetime(2021, 1, 1),
        datetime.datetime(2022, 1, 1),
        datetime.datetime(2022, 1, 1)
    ],
    'water': [100, 110, 90, 50, 40, 60, 70, 80, 90, 100],
    'fire': [100, 110, 9, 50, 4, 60, 470, 15, 90, 9100]
    }
)

data.groupby('id')[['water', 'fire']].apply(lambda x : x.rolling(3).corr())
                   water      fire
id                                
apple  0 water       NaN       NaN
         fire        NaN       NaN
       2 water       NaN       NaN
         fire        NaN       NaN
       4 water  1.000000  0.663924
         fire   0.663924  1.000000
       6 water  1.000000  0.123983
         fire   0.123983  1.000000
       8 water  1.000000  0.285230
         fire   0.285230  1.000000
orange 1 water       NaN       NaN
         fire        NaN       NaN
       3 water       NaN       NaN
         fire        NaN       NaN
       5 water  1.000000  1.000000
         fire   1.000000  1.000000
       7 water  1.000000 -0.854251
         fire  -0.854251  1.000000
       9 water  1.000000  0.863867
         fire   0.863867  1.000000

python pandas

Solution 1:^[1]

Here is one way to do it:

df = pd.concat(
    [
        data,
        data.groupby("id")[["water", "fire"]]
        .apply(lambda x: x.rolling(3).corr())
        .reset_index()
        .drop_duplicates(subset=["level_1"])
        .set_index("level_1")["fire"]
        .rename("corr")
    ],
    axis=1,
)

print(df)
# Output
       id       date  water  fire      corr
0   apple 2018-01-01    100   100       NaN
1  orange 2018-01-01    110   110       NaN
2   apple 2019-01-01     90     9       NaN
3  orange 2019-01-01     50    50       NaN
4   apple 2020-01-01     40     4  0.663924
5  orange 2020-01-01     60    60  1.000000
6   apple 2021-01-01     70   470  0.123983
7  orange 2021-01-01     80    15 -0.854251
8   apple 2022-01-01     90    90  0.285230
9  orange 2022-01-01    100  9100  0.863867

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution	Source
Solution 1	Laurent

'Adding correlation result back to pandas dataframe

Solution 1:[1]

Sources

Related Questions

Solution 1:^[1]