'Functional Programming: How does one create a new column in a multi-index data frame that is a function of another column?
Suppose the below simplified dataframe. (The actual df is much, much bigger.) How does one assign values to a new column f such that f is a function of another column (e.g. e)?
df = pd.DataFrame([[1,2,3,4], [5,6,7,8], [9,10,11,12], [13,14,15,16]])
df.columns = pd.MultiIndex.from_tuples((("a", "d"), ("a", "e"), ("b", "d"), ("b","e")))
df
a b
d e d e
0 1 2 3 4
1 5 6 7 8
2 9 10 11 12
3 13 14 15 16
Desired Output:
a b
d e f d e f
0 1 2 nan 3 4 nan
1 5 6 1.10 7 8 0.69
2 9 10 0.51 11 12 0.41
3 13 14 0.34 15 16 0.29
where column f is computed as np.log(df['e']).diff()
Solution 1:[1]
df = {c:df[c].assign(r=np.log(df[(c,'d')]).diff()) for c in df.columns.levels[0]}
df = pd.concat([df[c] for c in df.keys()], axis=1, keys = df.keys())
Solution 2:[2]
You could access the MultiIndex column using loc
, then use the functions directly on the sliced column, then join
it back to df
:
import numpy as np
df = (df.join(np.log(df.loc[:, (slice(None), 'e')])
.diff().round(2).rename(columns={'e':'f'}, level=1))
.sort_index(axis=1))
Output:
a b
d e f d e f
0 1 2 NaN 3 4 NaN
1 5 6 1.10 7 8 0.69
2 9 10 0.51 11 12 0.41
3 13 14 0.34 15 16 0.29
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | jgg |
Solution 2 |