'Catboost: how do I pass cat_features to a saved model in Python?

I have this pre-trained saved model, where I informed my categorical features, and it's working fine if I predict right after training. Now I wanna use it again in another context but I don't know how to properly inform the categorical features. I tried this:

model = CatBoostClassifier(cat_features=var_categ)  
model.load_model('catmod.cat')

but when I try to predict:

modelo.predict(base)

I get this error:

CatBoostError: features data: pandas.DataFrame column 'cod_var1' has dtype 'category' but is not in  cat_features list

Yes, I double checked the column is in var_categ.



Solution 1:[1]

First of all, you don't need to specify catboost classifier cat_features because the model already has this information from load_model.

I would guess from your error that when you use predict on the new data set, your features shifted by 1 location thus giving you the error.

Solution 2:[2]

Without seeing the code used to process data both for training and predicting, there's not quite enough to go on. The error means that when the model was trained, 'cod_var1' was not in the categorical features list. It may be in var_categ, but the model is indicating that it was not in the categorical features list used to train the model.

In your dataset base, cod_var1 is a "category" dtype. Since this is a CatBoost-specific dtype (not one that would automatically be set by pandas on dataframe creation), it appears you have some code between data loading and predicting that sets the dtype. I'd hypothesize that something changed in those data processing steps between when you trained it and now such that the prediction isn't exactly the same (same columns in the same order with the same types).

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Andreyn
Solution 2 K. Thorspear