'Pandas combining slices and list to select columns

Let us assume that a DataFrame df has the following columns: ['c1', 'c2', 'c3', 'c4', 'c5', 'c6', 'c7']

We can use a slice or a list to select some columns:

  • With a slice: df.loc[:, 'c1':'c4']
  • With a list: df.loc[:, ['c1','c4','c6']]

If we want select ['c1', 'c4', 'c5', 'c6', 'c7']

It is not possible to do something like: ['c1', 'c4':'c7']. But you see the idea, is it possible to combine a list and a slice ?



Solution 1:[1]

Using directly the columnnames one could do it in the following way:

df.loc[:, ['c1'] + ['c{}'.format(i) for i in range(3, 8)]]

Solution 2:[2]

You can using np.r_ with the column header index like this:

df = pd.DataFrame(np.arange(49).reshape(7,-1), columns=[f'c{i}' for i in range(1,8)])
df.loc[:, df.columns[np.r_[0,3:7]]]

Output:

   c1  c4  c5  c6  c7
0   0   3   4   5   6
1   7  10  11  12  13
2  14  17  18  19  20
3  21  24  25  26  27
4  28  31  32  33  34
5  35  38  39  40  41
6  42  45  46  47  48

Solution 3:[3]

You could build a custom index generating factory for the columns. This basically just stores a reference to the data frame's columns, then performs a lookup for any slice. You use it just by placing it in front of the desired list of columns.

I.e: use it by simply replacing df.loc[: ['c1', 'c4':'c7']] with df.loc[:, ci['c1', 'c4':'c7']]:

import pandas as pd

df = pd.DataFrame([list(range(7))], columns=[f'c{i}' for i in range(1, 8)])

df
# return:
   c1  c2  c3  c4  c5  c6  c7
0   0   1   2   3   4   5   6


class ColumnIndexer:
    def __init__(self, df):
        self._df_cols_ref = df.columns

    def __getitem__(self, ix):
        if not isinstance(ix, tuple):
            ix = (ix,)
        indices = []
        for i in ix:
            if isinstance(i, slice):
                if i.start is None:
                    left = 0
                else:
                    left = self._df_cols_ref.get_slice_bound(
                        i.start, 'left', 'loc')
                if i.stop is None:
                    right = len(self._df_cols_ref) + 1
                else:
                    right = df.columns.get_slice_bound(
                        i.stop, 'right', 'loc')
                indices.extend(self._df_cols_ref[left: right])
            else:
                indices.append(i)
        return indices

Using the object indexer:

ci = ColumnIndexer(df)

df.loc[:, ci['c1', 'c4':'c7']]
# returns:
   c1  c4  c5  c6  c7
0   0   3   4   5   6

Solution 4:[4]

Here's a simple method if you re-think how your input. Store the individual labels as is, but then make the slices tuples. A simple helper function parses that into all of the labels between the tuples.

Here there's no hard-coding of the ranges to slice, everything is based on the labels you originally supply.

Sample Data

import pandas as pd
import numpy as np

df = pd.DataFrame(np.random.normal(0, 1, (1, 10)),
                  columns = ['c1', 'c3', 'c4', 'foo', 'bar', 'c5', 'c6',
                             'c7', 'c8', 'c9'])

my_slice = ['c1', ('c4', 'c7')]

def create_labels(columns, my_slice):
    """
    columns: pd.Index
    my_slice: list of labels, or tuple if slice
    """
    labels = []
    for x in my_slice:
        if isinstance(x, tuple):
            labels.extend(columns[columns.get_loc(x[0]): 
                                  columns.get_loc(x[1])+1])
        else:
            labels.append(x)   
    return labels
        

create_labels(df.columns, my_slice)
#['c1', 'c4', 'foo', 'bar', 'c5', 'c6', 'c7']

Solution 5:[5]

One option for flexible column selection is with select_columns from pyjanitor:

sample data:

import pandas as pd
import numpy as np

df = pd.DataFrame(np.random.normal(0, 1, (1, 10)),
                  columns = ['c1', 'c3', 'c4', 'foo', 'bar', 'c5', 'c6',
                             'c7', 'c8', 'c9'])

df

         c1        c3        c4       foo      bar        c5        c6        c7        c8        c9
0  0.724323 -0.187807  1.246641  0.018456 -1.19307 -1.305052  0.516512  0.285618 -0.569626  0.842834
# pip install pyjanitor
import janitor

df.select_columns('c1', slice('c4', 'c7'))

         c1        c4       foo      bar        c5        c6        c7
0  0.724323  1.246641  0.018456 -1.19307 -1.305052  0.516512  0.285618

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Ruthger Righart
Solution 2 Scott Boston
Solution 3 James
Solution 4
Solution 5 sammywemmy