'How to select top level columns in multi header pandas dataframe
I have a multi header dataframe and it looks like that:
SPY ARKW
Open High Open High
Bid Ask Bid Ask Bid Ask Bid Ask
Date
2018-07-03 272.86 272.87 272.97 272.98 56.00 56.35 56.24 56.63
2018-07-02 269.49 269.51 272.04 272.05 54.87 55.21 56.04 56.09
2018-06-29 272.09 272.12 273.66 273.67 55.58 55.95 55.95 56.07
I want to iterate over all top level columns so in that case SPY and ARKW so I tried:
for column in df:
print(column)
but it iterates also over lower level columns:
('SPY', 'Open', 'Bid')
('SPY', 'Open', 'Ask')
('SPY', 'High', 'Bid')
('SPY', 'High', 'Ask')
('ARKW', 'Open', 'Bid')
('ARKW', 'Open', 'Ask')
('ARKW', 'High', 'Bid')
('ARKW', 'High', 'Ask')
How can I iterate over top level columns and then for each select Open Bid column?
Solution 1:[1]
I think loop is not necessary, for selecting use slicers:
idx = pd.IndexSlice
df1 = df.loc[:, idx[:, 'Open', 'Bid']]
print (df1)
SPY ARKW
Open Open
Bid Bid
2018-07-03 272.86 56.00
2018-07-02 269.49 54.87
2018-06-29 272.09 55.58
Or use DataFrame.xs
:
df1 = df.xs(('Open', 'Bid'), level=[1, 2], axis=1)
print (df1)
SPY ARKW
2018-07-03 272.86 56.00
2018-07-02 269.49 54.87
2018-06-29 272.09 55.58
df1 = df.xs(('Open', 'Bid'), level=[1, 2], axis=1, drop_level=False)
print (df1)
SPY ARKW
Open Open
Bid Bid
2018-07-03 272.86 56.00
2018-07-02 269.49 54.87
2018-06-29 272.09 55.58
EDIT:
For better performance all values in MultiIndex
has to be sorted:
df = df.sort_index(axis=1)
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 |