'How to set one pandas dataframe column value based on a 2nd column's value [duplicate]

Situation

I have a pandas dataframe dfwith a column sentiment_rating.

index sentiment_rating
2022-03-20 .3
2022-03-21 -.4
2022-03-24 -.7
2022-03-28 .6
2022-03-31 .2

Goal

I'm trying to create a new column status who's value will be either positive if the sentiment score is .5 or greater, negative if -.5 or less, or neutral if between -.5 and .5.

What I've tried

I've installed the pandas DataFrame module, and using this apply method:

df['status'] = df['sentiment_rating'].apply(lambda x: 'Positive' if x <= .8 else 'Neutral' if x > -.5 or < .5 else 'Negative' if x < -.5)

Results

I'm getting an Error message of invalid syntax, which doesn't tell me much.

I don't have a clear understanding of the lambda function, and am not even sure if apply is the right way to accomplish my goal.

I've also tried testing with this on 2 dimensions: df['status'] = ['Positive' if x > '.5' else 'other' for x in df['sentiment_rating']], and that's returning Error message TypeError: '>' not supported between instances of 'float' and 'str'

Any input on my approach and what I'm doing wrong greatly appreciated. Thx



Solution 1:[1]

You can extract the lambda function into separate function in order to make it more readable. You can use following,

def return_status(x):
    if x >= .5:
        return 'Positive'
    elif x <= -.5:
        return 'Negative'
    return 'Neutral'


df['status'] = df['rating'].apply(return_status)
print(df.head())

You will get following output

        index  rating    status
0  2022-03-20     0.3   Neutral
1  2022-03-21    -0.4   Neutral
2  2022-03-24    -0.7  Negative
3  2022-03-28     0.6  Positive
4  2022-03-31     0.2   Neutral

Solution 2:[2]

You can use numpy.select:

>>> import numpy as np
>>> df['status'] = np.select(
        condlist = [df.sentiment_rating > 0.5, df.sentiment_rating < -0.5],
        choicelist = ['positive', 'negative'],
        default = 'neutral'
    )
>>> df
        index  sentiment_rating    status
0  2022-03-20               0.3   neutral
1  2022-03-21              -0.4   neutral
2  2022-03-24              -0.7  negative
3  2022-03-28               0.6  positive
4  2022-03-31               0.2   neutral

For lambda:

>>> df['status'] = df.sentiment_rating.apply(
        lambda x: 'positive' if x > 0.5 
                  else 
                      'negative' if x < -0.5 
                      else 'neutral'
    )

But this is unnecessary and slow.

Solution 3:[3]

This might be a bit longer than what you were trying, but it does the job:

def status(series):
  if series >= 0.5:
    return "Positive"
  elif series > -0.5 and series < 0.5:
    return "Neutral"
  return "Negative"

df["Status"] = df["Sentiments"].apply(status)

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Manish Visave
Solution 2 Sayandip Dutta
Solution 3 Zero