'Metrics F1 warning zero division
I want to calculate the F1 score of my models. But I receive a warning and get a 0.0 F1-score and I don't know what to do.
here is the source code:
def model_evaluation(dict):
for key,value in dict.items():
classifier = Pipeline([('tfidf', TfidfVectorizer()),
('clf', value),
])
classifier.fit(X_train, y_train)
predictions = classifier.predict(X_test)
print("Accuracy Score of" , key , ": ", metrics.accuracy_score(y_test,predictions))
print(metrics.classification_report(y_test,predictions))
print(metrics.f1_score(y_test, predictions, average="weighted", labels=np.unique(predictions), zero_division=0))
print("---------------","\n")
dlist = { "KNeighborsClassifier": KNeighborsClassifier(3),"LinearSVC":
LinearSVC(), "MultinomialNB": MultinomialNB(), "RandomForest": RandomForestClassifier(max_depth=5, n_estimators=100)}
model_evaluation(dlist)
And here is the result:
Accuracy Score of KNeighborsClassifier : 0.75
precision recall f1-score support
not positive 0.71 0.77 0.74 13
positive 0.79 0.73 0.76 15
accuracy 0.75 28
macro avg 0.75 0.75 0.75 28
weighted avg 0.75 0.75 0.75 28
0.7503192848020434
---------------
Accuracy Score of LinearSVC : 0.8928571428571429
precision recall f1-score support
not positive 1.00 0.77 0.87 13
positive 0.83 1.00 0.91 15
accuracy 0.89 28
macro avg 0.92 0.88 0.89 28
weighted avg 0.91 0.89 0.89 28
0.8907396950875212
---------------
Accuracy Score of MultinomialNB : 0.5357142857142857
precision recall f1-score support
not positive 0.00 0.00 0.00 13
positive 0.54 1.00 0.70 15
accuracy 0.54 28
macro avg 0.27 0.50 0.35 28
weighted avg 0.29 0.54 0.37 28
0.6976744186046512
---------------
C:\Users\Cey\anaconda3\lib\site-packages\sklearn\metrics\_classification.py:1272: UndefinedMetricWarning: Precision and F-score are ill-defined and being set to 0.0 in labels with no predicted samples. Use `zero_division` parameter to control this behavior.
_warn_prf(average, modifier, msg_start, len(result))
Accuracy Score of RandomForest : 0.5714285714285714
precision recall f1-score support
not positive 1.00 0.08 0.14 13
positive 0.56 1.00 0.71 15
accuracy 0.57 28
macro avg 0.78 0.54 0.43 28
weighted avg 0.76 0.57 0.45 28
0.44897959183673475
---------------
Can someone tell me what to do? I only receive this message when using the "MultinomialNB()" classifier
Second:
When extending the dictionary by using the Gausian classifier (GaussianNB()) I receive this error message:
TypeError: A sparse matrix was passed, but dense data is required. Use X.toarray() to convert to a dense numpy array.
What should I do here ?
Solution 1:[1]
Can someone tell me what to do? I only receive this message when using the "MultinomialNB()" classifier
The first error seems to be indicating that a specific label is not predicted when using the MultinomialNB
, which results in an undefined f-score
, or ill-defined, since the missing values are set to 0
. This is explained here
When extending the dictionary by using the Gausian classifier (GaussianNB()) I receive this error message: TypeError: A sparse matrix was passed, but dense data is required. Use X.toarray() to convert to a dense numpy array.
As per this question, the error is quite explicit, the issue is that TfidfVectorizer
is returning a sparse
matrix, which cannot be used as input for the GaussianNB
. So the way I see it, you either avoid using the GaussianNB
, or you add an intermediate transformer to turn the sparse array to dense, which I wouldn't advise being the result of a tf-idf
vectorization.
Solution 2:[2]
Together with UndefinedMetricWarning: F-score is ill-defined and being set to 0.0 in labels with no predicted samples (main credits go there) and @yatu's answer, I could at least find a workaround for the warning:
UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 due to no predicted samples. Use
zero_division
parameter to control this behavior. _warn_prf(average, modifier, msg_start, len(result))
Quote from sklearn.metrics.f1_score in the Notes at the bottom:
When true positive + false positive == 0, precision is undefined. When true positive + false negative == 0, recall is undefined. In such cases, by default the metric will be set to 0, as will f-score, and UndefinedMetricWarning will be raised. This behavior can be modified with zero_division.
Thus, you cannot avoid this error if your data does not output a difference between true positives and false positives. That being said, you can only suppress the warning at least, adding zero_division=0 to the functions mentioned in the quote. In either case, set to 0 or 1, you will get a 0 value as the return anyway.
precision = precision_score(y_test, y_pred, zero_division=0)
print('Precision score: {0:0.2f}'.format(precision))
recall = recall_score(y_test, y_pred, zero_division=0)
print('Recall score: {0:0.2f}'.format(recall))
f1 = f1_score(y_test, y_pred, zero_division=0)
print('f1 score: {0:0.2f}'.format(recall))
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | yatu |
Solution 2 |