'Pandas DataFrame: replace all values in a column, based on condition
I have a simple DataFrame like the following:
I want to select all values from the 'First Season' column and replace those that are over 1990 by 1. In this example, only Baltimore Ravens would have the 1996 replaced by 1 (keeping the rest of the data intact).
I have used the following:
df.loc[(df['First Season'] > 1990)] = 1
But, it replaces all the values in that row by 1, and not just the values in the 'First Season' column.
How can I replace just the values from that column?
Solution 1:[1]
You need to select that column:
In [41]:
df.loc[df['First Season'] > 1990, 'First Season'] = 1
df
Out[41]:
Team First Season Total Games
0 Dallas Cowboys 1960 894
1 Chicago Bears 1920 1357
2 Green Bay Packers 1921 1339
3 Miami Dolphins 1966 792
4 Baltimore Ravens 1 326
5 San Franciso 49ers 1950 1003
So the syntax here is:
df.loc[<mask>(here mask is generating the labels to index) , <optional column(s)> ]
You can check the docs and also the 10 minutes to pandas which shows the semantics
EDIT
If you want to generate a boolean indicator then you can just use the boolean condition to generate a boolean Series and cast the dtype to int
this will convert True
and False
to 1
and 0
respectively:
In [43]:
df['First Season'] = (df['First Season'] > 1990).astype(int)
df
Out[43]:
Team First Season Total Games
0 Dallas Cowboys 0 894
1 Chicago Bears 0 1357
2 Green Bay Packers 0 1339
3 Miami Dolphins 0 792
4 Baltimore Ravens 1 326
5 San Franciso 49ers 0 1003
Solution 2:[2]
A bit late to the party but still - I prefer using numpy where:
import numpy as np
df['First Season'] = np.where(df['First Season'] > 1990, 1, df['First Season'])
Solution 3:[3]
df['First Season'].loc[(df['First Season'] > 1990)] = 1
strange that nobody has this answer, the only missing part of your code is the ['First Season'] right after df and just remove your curly brackets inside.
Solution 4:[4]
df.loc[df['First season'] > 1990, 'First Season'] = 1
Explanation:
df.loc
takes two arguments, 'row index' and 'column index'. We are checking if the value is greater than 1990 of each row value, under "First season" column and then we replacing it with 1.
Solution 5:[5]
for single condition, ie. ( 'employrate'] > 70 )
country employrate alcconsumption
0 Afghanistan 55.7000007629394 .03
1 Albania 51.4000015258789 7.29
2 Algeria 50.5 .69
3 Andorra 10.17
4 Angola 75.6999969482422 5.57
use this:
df.loc[df['employrate'] > 70, 'employrate'] = 7
country employrate alcconsumption
0 Afghanistan 55.700001 .03
1 Albania 51.400002 7.29
2 Algeria 50.500000 .69
3 Andorra nan 10.17
4 Angola 7.000000 5.57
therefore syntax here is:
df.loc[<mask>(here mask is generating the labels to index) , <optional column(s)> ]
For multiple conditions ie. (df['employrate'] <=55) & (df['employrate'] > 50)
use this:
df['employrate'] = np.where(
(df['employrate'] <=55) & (df['employrate'] > 50) , 11, df['employrate']
)
out[108]:
country employrate alcconsumption
0 Afghanistan 55.700001 .03
1 Albania 11.000000 7.29
2 Algeria 11.000000 .69
3 Andorra nan 10.17
4 Angola 75.699997 5.57
therefore syntax here is:
df['<column_name>'] = np.where((<filter 1> ) & (<filter 2>) , <new value>, df['column_name'])
Solution 6:[6]
Another option is to use a list comprehension:
df['First Season'] = [1 if year > 1990 else year for year in df['First Season']]
Solution 7:[7]
You can also use mask
which replaces the values where the condition is met:
df['First Season'].mask(lambda col: col > 1990, 1)
Solution 8:[8]
We can update the First Season column in df with the following syntax:
df['First Season'] = expression_for_new_values
To map the values in First Season we can use pandas‘ .map() method with the below syntax:
data_frame(['column']).map({'initial_value_1':'updated_value_1','initial_value_2':'updated_value_2'})
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | |
Solution 2 | Amir F |
Solution 3 | |
Solution 4 | MoD |
Solution 5 | |
Solution 6 | Henry |
Solution 7 | rachwa |
Solution 8 | Pobaranchuk |