'RandomForestClassifier instance not fitted yet. Call 'fit' with appropriate arguments before using this method
I am trying to train a decision tree model, save it, and then reload it when I need it later. However, I keep getting the following error:
This DecisionTreeClassifier instance is not fitted yet. Call 'fit' with appropriate arguments before using this method.
Here is my code:
X_train, X_test, y_train, y_test = train_test_split(data, label, test_size=0.20, random_state=4)
names = ["Decision Tree", "Random Forest", "Neural Net"]
classifiers = [
DecisionTreeClassifier(),
RandomForestClassifier(),
MLPClassifier()
]
score = 0
for name, clf in zip(names, classifiers):
if name == "Decision Tree":
clf = DecisionTreeClassifier(random_state=0)
grid_search = GridSearchCV(clf, param_grid=param_grid_DT)
grid_search.fit(X_train, y_train_TF)
if grid_search.best_score_ > score:
score = grid_search.best_score_
best_clf = clf
elif name == "Random Forest":
clf = RandomForestClassifier(random_state=0)
grid_search = GridSearchCV(clf, param_grid_RF)
grid_search.fit(X_train, y_train_TF)
if grid_search.best_score_ > score:
score = grid_search.best_score_
best_clf = clf
elif name == "Neural Net":
clf = MLPClassifier()
clf.fit(X_train, y_train_TF)
y_pred = clf.predict(X_test)
current_score = accuracy_score(y_test_TF, y_pred)
if current_score > score:
score = current_score
best_clf = clf
pkl_filename = "pickle_model.pkl"
with open(pkl_filename, 'wb') as file:
pickle.dump(best_clf, file)
from sklearn.externals import joblib
# Save to file in the current working directory
joblib_file = "joblib_model.pkl"
joblib.dump(best_clf, joblib_file)
print("best classifier: ", best_clf, " Accuracy= ", score)
Here is how I load the model and test it:
#First method
with open(pkl_filename, 'rb') as h:
loaded_model = pickle.load(h)
#Second method
joblib_model = joblib.load(joblib_file)
As you can see, I have tried two ways of saving it but none has worked.
Here is how I tested:
print(loaded_model.predict(test))
print(joblib_model.predict(test))
You can clearly see that the models are actually fitted and if I try with any other models such as SVM, or Logistic regression the method works just fine.
Solution 1:[1]
The problem is in this line:
best_clf = clf
You have passed clf
to grid_search
, which clones the estimator and fits the data on those cloned models. So your actual clf
remains untouched and unfitted.
What you need is
best_clf = grid_search
to save the fitted grid_search
model.
If you dont want to save the entire contents of grid_search, you can use the best_estimator_
attribute of grid_search
to get the actual cloned fitted model.
best_clf = grid_search.best_estimator_
Solution 2:[2]
Just wanted to add a little bit to above answer. Even if you copy paste the pickle file manually to different directory where you want to load the model, we end up with that error. If you want to move that file use cut paste.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | Vivek Kumar |
Solution 2 | Rupesh Suryawanshi |