'How to select multiple columns by name that are not adjacent and combining several slicing methods?
Is there a way to select multiple columns that are not adjacent and combining several methods?
test dataframe:
test = pd.DataFrame(np.random.rand(3, 9),
                    columns=['ID', 'rfm_snittbeløp_gaver', 'rfm_maksbeløp_gaver', 'rfm_antall_kampanjer',
                             'a','b','c','d','e'])
Let's say I want the columns: ID, all columns that start with rfm, a:c, and e. In this order.
I thought something along these lines would do it, but I was not able to make it work
frames = [test.loc[:, 'ID'],
          test.loc[:, test.columns.str.startswith('rfm')],
          test.loc[:, 'a':'c'],
          test.iloc[:, -1]]
test_sub = pd.concat(frames)
I read it resets the index and that I won't be able to control the order of the columns.
Preferably there would be something like np.r_ for .loc to combine slices as found in this post
But I do not like using index position when I refer to columns
Any help is much appreciated
Solution 1:[1]
here is a way using np.r_ and get_loc() and get_indexer():
ID= test.columns.get_loc('ID')
rfm=test.columns.get_indexer(test.columns[test.columns.str.startswith('rfm')])
a=test.columns.get_loc('a')
c=test.columns.get_loc('c')
e=test.columns.get_loc('e')
test.iloc[:,np.r_[ID,rfm,a:c+1,e]]
         ID  rfm_snittbeløp_gaver  rfm_maksbeløp_gaver  rfm_antall_kampanjer  \
0  0.822275              0.155649             0.189058              0.050138   
1  0.188038              0.286731             0.509774              0.171374   
2  0.626211              0.477937             0.585987              0.358124   
          a         b         c         e  
0  0.652142  0.492184  0.464453  0.361395  
1  0.242480  0.963673  0.898177  0.813195  
2  0.863088  0.781858  0.924203  0.690219  
    					Solution 2:[2]
You are pretty close, with axis=1:
frames = [test.loc[:, 'ID'],
          test.loc[:, test.columns.str.startswith('rfm')],
          test.loc[:, 'a':'c'],
          test.iloc[:, -1]]
test_sub = pd.concat(frames, axis=1)
    					Solution 3:[3]
A more general approach will be to create a class like the following:
class s_:
    """Create slices from given columns resembling numpy s_"""
    def __init__(self, cols):
        self.indices = pd.Series(cols, index=cols)
    def __getitem__(self, item):
        if isinstance(item, tuple):
            result = []
            for indexing in item:
                try:
                    if isinstance(indexing, str): # treat strings as labels
                        result.append(self.indices[indexing])
                    else:
                        result.extend(self.indices[indexing])
                except TypeError:
                    result.append(self.indices[indexing])
            return result
        else:
            return self.indices[item]
You could do then:
columns = ['ID', 'rfm_snittbeløp_gaver', 'rfm_maksbeløp_gaver', 'rfm_antall_kampanjer', 'a', 'b', 'c', 'd', 'e']
test = pd.DataFrame(np.random.rand(3, 9), columns=columns)
print(test.loc[:, s_(test.columns)['ID', test.columns.str.startswith('rfm'), 'a':'c', -1]])
Output
         ID  rfm_snittbeløp_gaver  ...         c         e
0  0.026803              0.603409  ...  0.819486  0.396006
1  0.791049              0.450502  ...  0.097529  0.708746
2  0.623558              0.513678  ...  0.140740  0.958713
[3 rows x 8 columns]
Note that this also works for single indexing:
print(test.loc[:, s_(test.columns)['ID']])
Output
0    0.129801
1    0.786684
2    0.839015
Name: ID, dtype: float64
    					Solution 4:[4]
One option for flexible column selection is with select_columns from pyjanitor:
# pip install pyjanitor
import pandas as pd
import janitor
test.select_columns('ID', 'rfm*', slice('a', 'c'), 'e')
         ID  rfm_snittbeløp_gaver  rfm_maksbeløp_gaver  rfm_antall_kampanjer         a         b         c         e
0  0.260244              0.890904             0.884521              0.600843  0.955812  0.213352  0.628574  0.428940
1  0.859678              0.330782             0.200329              0.774217  0.421452  0.494960  0.932772  0.454032
2  0.483668              0.894447             0.088583              0.806648  0.797400  0.240357  0.762199  0.069686
The * inrfm* ida is borrowed from unix.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source | 
|---|---|
| Solution 1 | anky | 
| Solution 2 | Quang Hoang | 
| Solution 3 | Dani Mesejo | 
| Solution 4 | sammywemmy | 
