'How to set one pandas dataframe column value based on a 2nd column's value [duplicate]
Situation
I have a pandas dataframe df
with a column sentiment_rating
.
index | sentiment_rating |
---|---|
2022-03-20 | .3 |
2022-03-21 | -.4 |
2022-03-24 | -.7 |
2022-03-28 | .6 |
2022-03-31 | .2 |
Goal
I'm trying to create a new column status
who's value will be either positive
if the sentiment score
is .5 or greater, negative
if -.5 or less, or neutral
if between -.5 and .5.
What I've tried
I've installed the pandas DataFrame module, and using this apply
method:
df['status'] = df['sentiment_rating'].apply(lambda x: 'Positive' if x <= .8 else 'Neutral' if x > -.5 or < .5 else 'Negative' if x < -.5)
Results
I'm getting an Error message
of invalid syntax
, which doesn't tell me much.
I don't have a clear understanding of the lambda
function, and am not even sure if apply
is the right way to accomplish my goal.
I've also tried testing with this on 2 dimensions: df['status'] = ['Positive' if x > '.5' else 'other' for x in df['sentiment_rating']]
, and that's returning Error message
TypeError: '>' not supported between instances of 'float' and 'str'
Any input on my approach and what I'm doing wrong greatly appreciated. Thx
Solution 1:[1]
You can extract the lambda function into separate function in order to make it more readable. You can use following,
def return_status(x):
if x >= .5:
return 'Positive'
elif x <= -.5:
return 'Negative'
return 'Neutral'
df['status'] = df['rating'].apply(return_status)
print(df.head())
You will get following output
index rating status
0 2022-03-20 0.3 Neutral
1 2022-03-21 -0.4 Neutral
2 2022-03-24 -0.7 Negative
3 2022-03-28 0.6 Positive
4 2022-03-31 0.2 Neutral
Solution 2:[2]
You can use numpy.select
:
>>> import numpy as np
>>> df['status'] = np.select(
condlist = [df.sentiment_rating > 0.5, df.sentiment_rating < -0.5],
choicelist = ['positive', 'negative'],
default = 'neutral'
)
>>> df
index sentiment_rating status
0 2022-03-20 0.3 neutral
1 2022-03-21 -0.4 neutral
2 2022-03-24 -0.7 negative
3 2022-03-28 0.6 positive
4 2022-03-31 0.2 neutral
For lambda
:
>>> df['status'] = df.sentiment_rating.apply(
lambda x: 'positive' if x > 0.5
else
'negative' if x < -0.5
else 'neutral'
)
But this is unnecessary and slow.
Solution 3:[3]
This might be a bit longer than what you were trying, but it does the job:
def status(series):
if series >= 0.5:
return "Positive"
elif series > -0.5 and series < 0.5:
return "Neutral"
return "Negative"
df["Status"] = df["Sentiments"].apply(status)
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | Manish Visave |
Solution 2 | Sayandip Dutta |
Solution 3 | Zero |