'How to scale a dataframe with datetime field in it (as a index)?
I want to scale a dataframe, which raises the error as in the title (or below).
My data:
df.head()
timestamp open high low close volume
0 2020-06-25 303.4700 305.26 301.2800 304.16 46340400
1 2020-06-24 309.8400 310.51 302.1000 304.09 123867696
2 2020-06-23 313.4801 314.50 311.6101 312.05 68066900
3 2020-06-22 307.9900 311.05 306.7500 310.62 74007212
4 2020-06-19 314.1700 314.38 306.5300 308.64 135211345
My code:
# Converting the index as date
from datetime import datetime
df.index = pd.to_datetime(df.index)
# Split data
split = len(df) - int(len(df) * 0.8)
df_train = df.iloc[split:]
df_test = df.iloc[:split]
# Normalize
from sklearn.preprocessing import MinMaxScaler
scaler = MinMaxScaler()
df_train = df_train.values.reshape(-1,1) #df_train = scaler.fit_transform(df_train)
df_test = df_test.values.reshape(-1,1) #df_test = scaler.fit_transform(df_train)
# Train the Scaler with training data and smooth data
timestep = 21
for i in range(0,len(df),timestep):
df_train = scaler.fit_transform(df_train[i:i+timestep,:])
#train_data[di:di+smoothing_window_size,:] = scaler.transform(train_data[di:di+smoothing_window_size,:])
# You normalize the last bit of remaining data
df_test = scaler.fit_transform(df_test[i+timestep:,:])
#train_data[di+timestep:,:] = scaler.transform(train_data[di+timestep:,:])
The error:
2 timestep = 21 3 for i in range(0,len(df),timestep): ----> 4 df_train = scaler.fit_transform(df_train[i:i+timestep,:]) 5 #train_data[di:di+smoothing_window_size,:] = scaler.transform(train_data[di:di+smoothing_window_size,:])
ValueError: could not convert string to float: '2020-05-28'
Help would be appraciated.
Solution 1:[1]
Simply iterate through the columns and scale each individually like this:
for col in X.columns:
X[col] = StandardScaler().fit_transform(X[col].to_numpy().reshape(-1,1)
you can create your own scaler if you want to do something within an SKlearn pipeline like this:
class Scaler(StandardScaler):
def __init__(self):
super().__init__()
def fit_transform(self):
for col in X.columns:
X[col] = StandardScaler().fit_transform(X[col].to_numpy().reshape(-1,1)
return X
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 |