'Removing Outliers

I tried removing outliers using the following function I created, but I am getting weird values after using it. Is my way of removing outliers correct?

def remove_outliers(df,numeric_features):
'''
remove_outliers is a function which removes outliers by removing any
point above the mean by 2 standard deviations or below the mean by 2 standard deviations
df is the dataframe which the outliers are to be removed from
numeric_features are the numeric columns which might contain outliers
return new data frame
'''

#Iterate all the columns in numeric features
for col in numeric_features:

    mean = df[col].mean() #Find mean of column
    std = np.std(df[col],axis = 0)#find standard deviation of column

    #Variables used to find outliers
    above_outliers = mean + 2*std
    below_outliers = mean - 2*std

    outlier_indexes = df[col].loc[lambda x: (x>=above_outliers)|(x<=below_outliers)]

    #drop outliers from the dataframe column
    df= df.drop(outlier_indexes.index)
return df


Solution 1:[1]

try like below

  df1=  df[(df['col']>=below_outliers)&(df['col']<=above_outliers))

Solution 2:[2]

I suggest you to use neulab Python library (https://pypi.org/project/neulab).

There you can use Simple Algotithm to find and delete outliers:

from neulab.OutlierDetection import SimpleOutDetect

d = {'col1': [1, 0, 342, 1, 1, 0, 1, 0, 1, 255, 1, 1, 1, 0, ]}
df = pd.DataFrame(data=d)

sd = SimpleOutDetect(dataframe=df, info=False, autorm=True)

Output: Detected outliers: {'col1': [342, 255]}

index   col1
0      1
1      0
3      1
4      1
5      0 
6      1
7      0
8      1
10     1
11     1
12     1
13     0

Or use Chauvenet Algorithm:

from neulab.OutlierDetection import Chauvenet

d = {'col1': [8.02, 8.16, 3.97, 8.64, 0.84, 4.46, 0.81, 7.74, 8.78, 9.26, 20.46, 29.87, 10.38, 25.71], 'col2': [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]}
df = pd.DataFrame(data=d)

chvn = Chauvenet(dataframe=df, info=True, autorm=True)

Output: Detected outliers: {'col1': [29.87, 25.71, 20.46, 0.84, 0.81, 3.97, 4.46, 10.38, 7.74, 9.26]}

    col1    col2
0   8.02    1
1   8.16    1
3   8.64    1
8   8.78    1

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Zaynul Abadin Tuhin
Solution 2 kndahl