'Pandas combining slices and list to select columns
Let us assume that a DataFrame df
has the following columns: ['c1', 'c2', 'c3', 'c4', 'c5', 'c6', 'c7']
We can use a slice or a list to select some columns:
- With a slice:
df.loc[:, 'c1':'c4']
- With a list:
df.loc[:, ['c1','c4','c6']]
If we want select ['c1', 'c4', 'c5', 'c6', 'c7']
It is not possible to do something like: ['c1', 'c4':'c7']
. But you see the idea, is it possible to combine a list and a slice ?
Solution 1:[1]
Using directly the columnnames one could do it in the following way:
df.loc[:, ['c1'] + ['c{}'.format(i) for i in range(3, 8)]]
Solution 2:[2]
You can using np.r_ with the column header index like this:
df = pd.DataFrame(np.arange(49).reshape(7,-1), columns=[f'c{i}' for i in range(1,8)])
df.loc[:, df.columns[np.r_[0,3:7]]]
Output:
c1 c4 c5 c6 c7
0 0 3 4 5 6
1 7 10 11 12 13
2 14 17 18 19 20
3 21 24 25 26 27
4 28 31 32 33 34
5 35 38 39 40 41
6 42 45 46 47 48
Solution 3:[3]
You could build a custom index generating factory for the columns. This basically just stores a reference to the data frame's columns, then performs a lookup for any slice. You use it just by placing it in front of the desired list of columns.
I.e: use it by simply replacing df.loc[: ['c1', 'c4':'c7']]
with df.loc[:, ci['c1', 'c4':'c7']]
:
import pandas as pd
df = pd.DataFrame([list(range(7))], columns=[f'c{i}' for i in range(1, 8)])
df
# return:
c1 c2 c3 c4 c5 c6 c7
0 0 1 2 3 4 5 6
class ColumnIndexer:
def __init__(self, df):
self._df_cols_ref = df.columns
def __getitem__(self, ix):
if not isinstance(ix, tuple):
ix = (ix,)
indices = []
for i in ix:
if isinstance(i, slice):
if i.start is None:
left = 0
else:
left = self._df_cols_ref.get_slice_bound(
i.start, 'left', 'loc')
if i.stop is None:
right = len(self._df_cols_ref) + 1
else:
right = df.columns.get_slice_bound(
i.stop, 'right', 'loc')
indices.extend(self._df_cols_ref[left: right])
else:
indices.append(i)
return indices
Using the object indexer:
ci = ColumnIndexer(df)
df.loc[:, ci['c1', 'c4':'c7']]
# returns:
c1 c4 c5 c6 c7
0 0 3 4 5 6
Solution 4:[4]
Here's a simple method if you re-think how your input. Store the individual labels as is, but then make the slices tuples. A simple helper function parses that into all of the labels between the tuples.
Here there's no hard-coding of the ranges to slice, everything is based on the labels you originally supply.
Sample Data
import pandas as pd
import numpy as np
df = pd.DataFrame(np.random.normal(0, 1, (1, 10)),
columns = ['c1', 'c3', 'c4', 'foo', 'bar', 'c5', 'c6',
'c7', 'c8', 'c9'])
my_slice = ['c1', ('c4', 'c7')]
def create_labels(columns, my_slice):
"""
columns: pd.Index
my_slice: list of labels, or tuple if slice
"""
labels = []
for x in my_slice:
if isinstance(x, tuple):
labels.extend(columns[columns.get_loc(x[0]):
columns.get_loc(x[1])+1])
else:
labels.append(x)
return labels
create_labels(df.columns, my_slice)
#['c1', 'c4', 'foo', 'bar', 'c5', 'c6', 'c7']
Solution 5:[5]
One option for flexible column selection is with select_columns from pyjanitor:
sample data:
import pandas as pd
import numpy as np
df = pd.DataFrame(np.random.normal(0, 1, (1, 10)),
columns = ['c1', 'c3', 'c4', 'foo', 'bar', 'c5', 'c6',
'c7', 'c8', 'c9'])
df
c1 c3 c4 foo bar c5 c6 c7 c8 c9
0 0.724323 -0.187807 1.246641 0.018456 -1.19307 -1.305052 0.516512 0.285618 -0.569626 0.842834
# pip install pyjanitor
import janitor
df.select_columns('c1', slice('c4', 'c7'))
c1 c4 foo bar c5 c6 c7
0 0.724323 1.246641 0.018456 -1.19307 -1.305052 0.516512 0.285618
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | Ruthger Righart |
Solution 2 | Scott Boston |
Solution 3 | James |
Solution 4 | |
Solution 5 | sammywemmy |