'Sort dataframe multiindex level and by column
#Updated: pandas version 0.23.0 solves this problem with
Sorting by a combination of columns and index levels
I have struggled with this and I suspect there is a better way. How do I sort the following dataframe by index level name 'idx_0', level=0 and by column, 'value_1' descending such that the column 'MyName' reads vertical 'SCOTTBOSTON'.
import pandas as pd
import numpy as np
df = pd.DataFrame({'idx_0':[2]*6+[1]*5,
'idx_1':[6,4,2,10,18,5,11,1,7,9,3],
'value_1':np.arange(11,0,-1),
'MyName':list('BOSTONSCOTT')})
df = df.set_index(['idx_0','idx_1'])
df
Output:
MyName value_1
idx_0 idx_1
2 6 B 11
4 O 10
2 S 9
10 T 8
18 O 7
5 N 6
1 11 S 5
1 C 4
7 O 3
9 T 2
3 T 1
#Excepted output using:
df.sort_values(['value_1'], ascending=False)\
.reindex(sorted(df.index.get_level_values(0).unique()), level=0)
I suspect there is an easier way without resetting indexes
MyName value_1
idx_0 idx_1
1 11 S 5
1 C 4
7 O 3
9 T 2
3 T 1
2 6 B 11
4 O 10
2 S 9
10 T 8
18 O 7
5 N 6
Failure #1:
df.sort_values('value_1', ascending=False).sort_index(level=0)
Sort by values first then sort index level=0, but level=1 get sorted also.
MyName value_1
idx_0 idx_1
1 1 C 4
3 T 1
7 O 3
9 T 2
11 S 5
2 2 S 9
4 O 10
5 N 6
6 B 11
10 T 8
18 O 7
Failure #2
df.sort_index(level=0).sort_values('value_1', ascending=False)
Sort by index level=0 then sort by values, but index=0 gets jumbled again.
MyName value_1
idx_0 idx_1
2 6 B 11
4 O 10
2 S 9
10 T 8
18 O 7
5 N 6
1 11 S 5
1 C 4
7 O 3
9 T 2
3 T 1
Solution 1:[1]
Here are some potential solutions for your needs:
Method-1:
(df.sort_values('value_1', ascending=False)
.sort_index(level=[0], ascending=[True]))
Method-2:
(df.set_index('value_1', append=True)
.sort_index(level=[0,2], ascending=[True,False])
.reset_index('value_1'))
Tested on pandas 0.22.0, Python 3.6.4
Solution 2:[2]
Here is my ugly option:
In [139]: (df.assign(x=df.index.get_level_values(0) * \
10**np.ceil(np.log10(df.value_1.max()))-df.value_1)
.sort_values('x')
.drop('x',1))
Out[139]:
MyName value_1
idx_0 idx_1
1 11 S 5
1 C 4
7 O 3
9 T 2
3 T 1
2 6 B 11
4 O 10
2 S 9
10 T 8
18 O 7
5 N 6
some explanations:
In [140]: np.ceil(np.log10(df.value_1.max()))
Out[140]: 2.0
In [141]: df.assign(x=df.index.get_level_values(0)*10**np.ceil(np.log10(df.value_1.max()))-df.value_1)
Out[141]:
MyName value_1 x
idx_0 idx_1
2 6 B 11 189.0
4 O 10 190.0
2 S 9 191.0
10 T 8 192.0
18 O 7 193.0
5 N 6 194.0
1 11 S 5 95.0
1 C 4 96.0
7 O 3 97.0
9 T 2 98.0
3 T 1 99.0
another option is to add idx_0
sort by it and by value_1
and drop that additional column:
In [142]: (df.assign(x=df.index.get_level_values(0)).sort_values(['x', 'value_1'], ascending=[1,0])
.drop('x',1))
Out[142]:
MyName value_1
idx_0 idx_1
1 11 S 5
1 C 4
7 O 3
9 T 2
3 T 1
2 6 B 11
4 O 10
2 S 9
10 T 8
18 O 7
5 N 6
Solution 3:[3]
Update using pandas version 0.23.0
Sorting by a combination of columns and index levels
df.sort_values(by=['idx_0','value_1'], ascending=[True,False])
output:
value_1 MyName
idx_0 idx_1
1 11 5 S
1 4 C
7 3 O
9 2 T
3 1 T
2 6 11 B
4 10 O
2 9 S
10 8 T
18 7 O
5 6 N
Interestingly enough, @jxc pointed out a solution that I thought should work and was almost exactly as my first failure.
df.sort_values('value_1', ascending=False)\
.sort_index(level=0, ascending=[True])
It is the passing ascending
as a list which makes the above statement work as excepted. I think in pandas passing a scalar value and a list of one should work the same. However, in this case, it appears not to work the same.
I'll submit a bug report.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | |
Solution 2 | MaxU - stop genocide of UA |
Solution 3 |