'Select two sets of columns by column names in Pandas

Take the DataFrame in the answer of Loc vs. iloc vs. ix vs. at vs. iat? for example.

df = pd.DataFrame(
{'age':[30, 2, 12, 4, 32, 33, 69],
 'color':['blue', 'green', 'red', 'white', 'gray', 'black', 'red'],
 'food':['Steak', 'Lamb', 'Mango', 'Apple', 'Cheese', 'Melon', 'Beans'],
 'height':[165, 70, 120, 80, 180, 172, 150],
 'score':[4.6, 8.3, 9.0, 3.3, 1.8, 9.5, 2.2],
 'state':['NY', 'TX', 'FL', 'AL', 'AK', 'TX', 'TX']},
 index=['Jane', 'Nick', 'Aaron', 'Penelope', 'Dean', 'Christina', 'Cornelia']
)

Now I want all columns except 'food' and 'height'.

I thought something like df.loc[:,['age':'color', 'score':'state']] would work, but Python returns SyntaxError: invalid syntax.

I am aware of that there is one way to work around: df.drop(columns = ['food', 'height']). However, in my real life situation, I have hundreds of columns to be dropped. Typing out all column names is so inefficient.

I am expecting something similar with dplyr::select(df, -(food:height)) or dplyr::select(df, age:color, score:state) in R language.

Also have read Selecting/Excluding sets of columns in Pandas.



Solution 1:[1]

First, find all columns lying between food and height (inclusive).

c = df.iloc[-1:0].loc[:, 'food':'height'].columns

Next, filter with difference/isin/setdiff1d -

df[df.columns.difference(c)]

Or,

df.loc[:, ~df.columns.isin(c)]

Or,

df[np.setdiff1d(df.columns, c)]

           age  color  score state
Jane        30   blue    4.6    NY
Nick         2  green    8.3    TX
Aaron       12    red    9.0    FL
Penelope     4  white    3.3    AL
Dean        32   gray    1.8    AK
Christina   33  black    9.5    TX
Cornelia    69    red    2.2    TX

Solution 2:[2]

First get positions of columns names by Index.get_loc and then use numpy.r_ for join all slicers together:

a = np.r_[df.columns.get_loc('age'):df.columns.get_loc('color')+1, 
          df.columns.get_loc('score'):df.columns.get_loc('state')+1]

df = df.iloc[:, a]
print (df)
           age  color  score state
Jane        30   blue    4.6    NY
Nick         2  green    8.3    TX
Aaron       12    red    9.0    FL
Penelope     4  white    3.3    AL
Dean        32   gray    1.8    AK
Christina   33  black    9.5    TX
Cornelia    69    red    2.2    TX

Solution 3:[3]

One option for flexible column selection is with select_columns from pyjanitor:

# pip install pyjanitor
import pandas as pd
import janitor

df.select_columns(slice('age', 'color'), slice('score', 'state'))

           age  color  score state
Jane        30   blue    4.6    NY
Nick         2  green    8.3    TX
Aaron       12    red    9.0    FL
Penelope     4  white    3.3    AL
Dean        32   gray    1.8    AK
Christina   33  black    9.5    TX
Cornelia    69    red    2.2    TX

df.select_columns(slice('food', 'height'), invert = True)

           age  color  score state
Jane        30   blue    4.6    NY
Nick         2  green    8.3    TX
Aaron       12    red    9.0    FL
Penelope     4  white    3.3    AL
Dean        32   gray    1.8    AK
Christina   33  black    9.5    TX
Cornelia    69    red    2.2    TX

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1
Solution 2
Solution 3 sammywemmy