'How to scale a dataframe with datetime field in it (as a index)?

I want to scale a dataframe, which raises the error as in the title (or below).

My data:

df.head()

timestamp   open    high    low close   volume
0   2020-06-25  303.4700    305.26  301.2800    304.16  46340400
1   2020-06-24  309.8400    310.51  302.1000    304.09  123867696
2   2020-06-23  313.4801    314.50  311.6101    312.05  68066900
3   2020-06-22  307.9900    311.05  306.7500    310.62  74007212
4   2020-06-19  314.1700    314.38  306.5300    308.64  135211345

My code:

# Converting the index as date
from datetime import datetime

df.index = pd.to_datetime(df.index)

# Split data
split = len(df) - int(len(df) * 0.8)
df_train = df.iloc[split:]
df_test = df.iloc[:split]

# Normalize
from sklearn.preprocessing import MinMaxScaler

scaler = MinMaxScaler()
df_train = df_train.values.reshape(-1,1) #df_train = scaler.fit_transform(df_train)
df_test = df_test.values.reshape(-1,1) #df_test = scaler.fit_transform(df_train)

# Train the Scaler with training data and smooth data
timestep = 21
for i in range(0,len(df),timestep):
    df_train = scaler.fit_transform(df_train[i:i+timestep,:])
    #train_data[di:di+smoothing_window_size,:] = scaler.transform(train_data[di:di+smoothing_window_size,:])

# You normalize the last bit of remaining data
df_test = scaler.fit_transform(df_test[i+timestep:,:])
#train_data[di+timestep:,:] = scaler.transform(train_data[di+timestep:,:])

The error:

      2 timestep = 21
      3 for i in range(0,len(df),timestep):
----> 4     df_train = scaler.fit_transform(df_train[i:i+timestep,:])
      5     #train_data[di:di+smoothing_window_size,:] = scaler.transform(train_data[di:di+smoothing_window_size,:])

ValueError: could not convert string to float: '2020-05-28'

Help would be appraciated.



Solution 1:[1]

Simply iterate through the columns and scale each individually like this:

for col in X.columns:
    X[col] = StandardScaler().fit_transform(X[col].to_numpy().reshape(-1,1)

you can create your own scaler if you want to do something within an SKlearn pipeline like this:

class Scaler(StandardScaler):

    def __init__(self):

        super().__init__()

    def fit_transform(self):

        for col in X.columns:
            X[col] = StandardScaler().fit_transform(X[col].to_numpy().reshape(-1,1)

        return  X

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1