'ValueError: A given column is not a column of the dataframe in pipeline and columntransformer

I am working on the toy dataset with ColumnTransformer and pipeline but I came across the error which I couldn't find a solution on the internet.

toy = pd.read_csv('toy_dataset.csv')
toy_drop=toy.drop(['Number','Illness'],axis=1)
toy_target= toy.Illness
toy_target=toy_target.to_frame()

Data is imported:

rb=RobustScaler()
normalization=MinMaxScaler()
ohe=OneHotEncoder(sparse=False)
le=LabelEncoder()
oe=OrdinalEncoder()
bins = KBinsDiscretizer(n_bins=5, encode='onehot-dense', strategy='uniform')

ct_features=ColumnTransformer([('normalization',normalization,['Income']),
                      ('ohe',ohe,['City','Gender','Illness']),
                      ('bins',bins,['Age']),
                      ],remainder='drop')


pip = Pipeline([
    ("ct",ct_features),
    #("collabel",ct_label),
    ('lr',LinearRegression())])

x_train,x_test,y_train,y_test=train_test_split(toy_drop,toy_target, test_size=0.2,random_state=2021)

pip.fit(x_train,y_train)

I think everything looks clear but this error:

ValueError: A given column is not a column of the dataframe

occurred.



Solution 1:[1]

Instead of remainder='drop' in the ColumnTransformer write: remainder='passthrough'.

As you can see at sklearn documentation, by default, only the specified columns in transformers are transformed and combined in the output, and the non-specified columns are dropped. (default of 'drop'). By specifying remainder='passthrough', all remaining columns that were not specified in transformers will be automatically passed through

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 ElhamMotamedi