'How to select multiple columns by name that are not adjacent and combining several slicing methods?

Is there a way to select multiple columns that are not adjacent and combining several methods?

test dataframe:

test = pd.DataFrame(np.random.rand(3, 9),
                    columns=['ID', 'rfm_snittbeløp_gaver', 'rfm_maksbeløp_gaver', 'rfm_antall_kampanjer',
                             'a','b','c','d','e'])

Let's say I want the columns: ID, all columns that start with rfm, a:c, and e. In this order.

I thought something along these lines would do it, but I was not able to make it work

frames = [test.loc[:, 'ID'],
          test.loc[:, test.columns.str.startswith('rfm')],
          test.loc[:, 'a':'c'],
          test.iloc[:, -1]]

test_sub = pd.concat(frames)

I read it resets the index and that I won't be able to control the order of the columns.

Preferably there would be something like np.r_ for .loc to combine slices as found in this post

  1. Selecting non-adjacent columns by column number pandas

But I do not like using index position when I refer to columns

Any help is much appreciated



Solution 1:[1]

here is a way using np.r_ and get_loc() and get_indexer():

ID= test.columns.get_loc('ID')
rfm=test.columns.get_indexer(test.columns[test.columns.str.startswith('rfm')])
a=test.columns.get_loc('a')
c=test.columns.get_loc('c')
e=test.columns.get_loc('e')

test.iloc[:,np.r_[ID,rfm,a:c+1,e]]

         ID  rfm_snittbeløp_gaver  rfm_maksbeløp_gaver  rfm_antall_kampanjer  \
0  0.822275              0.155649             0.189058              0.050138   
1  0.188038              0.286731             0.509774              0.171374   
2  0.626211              0.477937             0.585987              0.358124   

          a         b         c         e  
0  0.652142  0.492184  0.464453  0.361395  
1  0.242480  0.963673  0.898177  0.813195  
2  0.863088  0.781858  0.924203  0.690219  

Solution 2:[2]

You are pretty close, with axis=1:

frames = [test.loc[:, 'ID'],
          test.loc[:, test.columns.str.startswith('rfm')],
          test.loc[:, 'a':'c'],
          test.iloc[:, -1]]

test_sub = pd.concat(frames, axis=1)

Solution 3:[3]

A more general approach will be to create a class like the following:

class s_:
    """Create slices from given columns resembling numpy s_"""

    def __init__(self, cols):
        self.indices = pd.Series(cols, index=cols)

    def __getitem__(self, item):
        if isinstance(item, tuple):
            result = []
            for indexing in item:
                try:
                    if isinstance(indexing, str): # treat strings as labels
                        result.append(self.indices[indexing])
                    else:
                        result.extend(self.indices[indexing])
                except TypeError:
                    result.append(self.indices[indexing])
            return result
        else:
            return self.indices[item]

You could do then:

columns = ['ID', 'rfm_snittbeløp_gaver', 'rfm_maksbeløp_gaver', 'rfm_antall_kampanjer', 'a', 'b', 'c', 'd', 'e']
test = pd.DataFrame(np.random.rand(3, 9), columns=columns)

print(test.loc[:, s_(test.columns)['ID', test.columns.str.startswith('rfm'), 'a':'c', -1]])

Output

         ID  rfm_snittbeløp_gaver  ...         c         e
0  0.026803              0.603409  ...  0.819486  0.396006
1  0.791049              0.450502  ...  0.097529  0.708746
2  0.623558              0.513678  ...  0.140740  0.958713

[3 rows x 8 columns]

Note that this also works for single indexing:

print(test.loc[:, s_(test.columns)['ID']])

Output

0    0.129801
1    0.786684
2    0.839015
Name: ID, dtype: float64

Solution 4:[4]

One option for flexible column selection is with select_columns from pyjanitor:

# pip install pyjanitor
import pandas as pd
import janitor
test.select_columns('ID', 'rfm*', slice('a', 'c'), 'e')

         ID  rfm_snittbeløp_gaver  rfm_maksbeløp_gaver  rfm_antall_kampanjer         a         b         c         e
0  0.260244              0.890904             0.884521              0.600843  0.955812  0.213352  0.628574  0.428940
1  0.859678              0.330782             0.200329              0.774217  0.421452  0.494960  0.932772  0.454032
2  0.483668              0.894447             0.088583              0.806648  0.797400  0.240357  0.762199  0.069686

The * inrfm* ida is borrowed from unix.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 anky
Solution 2 Quang Hoang
Solution 3 Dani Mesejo
Solution 4 sammywemmy