'Using StandardScaler for multiple columns

I want to use StandardScaler only on certain columns, however my code resulted in error. Here is my code:

from sklearn.preprocessing import StandardScaler
num_cols = ['fare_amount','trip_distance','jfk_drop_distance','lga_drop_distance','ewr_drop_distance','met_drop_distance','wtc_drop_distance']
scaler = StandardScaler()
scaled_data = scaler.fit_transform(df[num_cols])
scaled_data

Output:

KeyError: "['trip_distance', 'jfk_drop_distance', 'lga_drop_distance', 'ewr_drop_distance', 'met_drop_distance', 'wtc_drop_distance'] not in index"


Solution 1:[1]

It seems that your DF doesn't have the columns in the axis. Be sure that the names are correct.

If the names of the columns were correct, you would lose the DataFrame and get an array by running this code. You would had to change the array to dataframe or scale the columns with for loop.

# Convert to dataframe
from sklearn.preprocessing import StandardScaler
num_cols = [
    'fare_amount',
    'trip_distance',
    'jfk_drop_distance',
    'lga_drop_distance',
    'ewr_drop_distance',
    'met_drop_distance',
    'wtc_drop_distance'
]
scaler = StandardScaler()
scaled_data = pd.DataFrame(scaler.fit_transform(df[num_cols]), columns = df.columns)

Or, try it with a for loop, which I find easier:

from sklearn.preprocessing import StandardScaler
num_cols = [
    'fare_amount',
    'trip_distance',
    'jfk_drop_distance',
    'lga_drop_distance',
    'ewr_drop_distance',
    'met_drop_distance',
    'wtc_drop_distance'
]
scaler = StandardScaler()

#for loop
for col in num_cols:
    df[col] = scaler.fit_transform(df[[col]])

Be sure to use second '[' in scaler object since StandarScaler requires its input to be a 2D array. Not using it will cause:

ValueError: Expected 2D array, got 1D array instead 

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 BrokenBenchmark