'ValueError: A given column is not a column of the dataframe in pipeline and columntransformer
I am working on the toy dataset with ColumnTransformer
and pipeline
but I came across the error which I couldn't find a solution on the internet.
toy = pd.read_csv('toy_dataset.csv')
toy_drop=toy.drop(['Number','Illness'],axis=1)
toy_target= toy.Illness
toy_target=toy_target.to_frame()
Data is imported:
rb=RobustScaler()
normalization=MinMaxScaler()
ohe=OneHotEncoder(sparse=False)
le=LabelEncoder()
oe=OrdinalEncoder()
bins = KBinsDiscretizer(n_bins=5, encode='onehot-dense', strategy='uniform')
ct_features=ColumnTransformer([('normalization',normalization,['Income']),
('ohe',ohe,['City','Gender','Illness']),
('bins',bins,['Age']),
],remainder='drop')
pip = Pipeline([
("ct",ct_features),
#("collabel",ct_label),
('lr',LinearRegression())])
x_train,x_test,y_train,y_test=train_test_split(toy_drop,toy_target, test_size=0.2,random_state=2021)
pip.fit(x_train,y_train)
I think everything looks clear but this error:
ValueError: A given column is not a column of the dataframe
occurred.
Solution 1:[1]
Instead of remainder='drop'
in the ColumnTransformer
write: remainder='passthrough'
.
As you can see at sklearn documentation, by default, only the specified columns in transformers are transformed and combined in the output, and the non-specified columns are dropped. (default of 'drop'). By specifying remainder='passthrough', all remaining columns that were not specified in transformers will be automatically passed through
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | ElhamMotamedi |