'Metrics F1 warning zero division

I want to calculate the F1 score of my models. But I receive a warning and get a 0.0 F1-score and I don't know what to do.

here is the source code:

def model_evaluation(dict):

    for key,value in dict.items():

        classifier = Pipeline([('tfidf', TfidfVectorizer()),
                         ('clf', value),
        classifier.fit(X_train, y_train)
        predictions = classifier.predict(X_test)
        print("Accuracy Score of" , key ,  ": ", metrics.accuracy_score(y_test,predictions))
        print(metrics.f1_score(y_test, predictions, average="weighted", labels=np.unique(predictions), zero_division=0))

dlist =  { "KNeighborsClassifier": KNeighborsClassifier(3),"LinearSVC":
    LinearSVC(), "MultinomialNB": MultinomialNB(), "RandomForest": RandomForestClassifier(max_depth=5, n_estimators=100)}


And here is the result:

Accuracy Score of KNeighborsClassifier :  0.75
              precision    recall  f1-score   support

not positive       0.71      0.77      0.74        13
    positive       0.79      0.73      0.76        15

    accuracy                           0.75        28
   macro avg       0.75      0.75      0.75        28
weighted avg       0.75      0.75      0.75        28


Accuracy Score of LinearSVC :  0.8928571428571429
              precision    recall  f1-score   support

not positive       1.00      0.77      0.87        13
    positive       0.83      1.00      0.91        15

    accuracy                           0.89        28
   macro avg       0.92      0.88      0.89        28
weighted avg       0.91      0.89      0.89        28


Accuracy Score of MultinomialNB :  0.5357142857142857
              precision    recall  f1-score   support

not positive       0.00      0.00      0.00        13
    positive       0.54      1.00      0.70        15

    accuracy                           0.54        28
   macro avg       0.27      0.50      0.35        28
weighted avg       0.29      0.54      0.37        28


C:\Users\Cey\anaconda3\lib\site-packages\sklearn\metrics\_classification.py:1272: UndefinedMetricWarning: Precision and F-score are ill-defined and being set to 0.0 in labels with no predicted samples. Use `zero_division` parameter to control this behavior.
  _warn_prf(average, modifier, msg_start, len(result))
Accuracy Score of RandomForest :  0.5714285714285714
              precision    recall  f1-score   support

not positive       1.00      0.08      0.14        13
    positive       0.56      1.00      0.71        15

    accuracy                           0.57        28
   macro avg       0.78      0.54      0.43        28
weighted avg       0.76      0.57      0.45        28


Can someone tell me what to do? I only receive this message when using the "MultinomialNB()" classifier


When extending the dictionary by using the Gausian classifier (GaussianNB()) I receive this error message:

TypeError: A sparse matrix was passed, but dense data is required. Use X.toarray() to convert to a dense numpy array.

What should I do here ?

Solution 1:[1]

Can someone tell me what to do? I only receive this message when using the "MultinomialNB()" classifier

The first error seems to be indicating that a specific label is not predicted when using the MultinomialNB, which results in an undefined f-score, or ill-defined, since the missing values are set to 0. This is explained here

When extending the dictionary by using the Gausian classifier (GaussianNB()) I receive this error message: TypeError: A sparse matrix was passed, but dense data is required. Use X.toarray() to convert to a dense numpy array.

As per this question, the error is quite explicit, the issue is that TfidfVectorizer is returning a sparse matrix, which cannot be used as input for the GaussianNB. So the way I see it, you either avoid using the GaussianNB, or you add an intermediate transformer to turn the sparse array to dense, which I wouldn't advise being the result of a tf-idf vectorization.

Solution 2:[2]

Together with UndefinedMetricWarning: F-score is ill-defined and being set to 0.0 in labels with no predicted samples (main credits go there) and @yatu's answer, I could at least find a workaround for the warning:

UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 due to no predicted samples. Use zero_division parameter to control this behavior. _warn_prf(average, modifier, msg_start, len(result))

Quote from sklearn.metrics.f1_score in the Notes at the bottom:

When true positive + false positive == 0, precision is undefined. When true positive + false negative == 0, recall is undefined. In such cases, by default the metric will be set to 0, as will f-score, and UndefinedMetricWarning will be raised. This behavior can be modified with zero_division.

Thus, you cannot avoid this error if your data does not output a difference between true positives and false positives. That being said, you can only suppress the warning at least, adding zero_division=0 to the functions mentioned in the quote. In either case, set to 0 or 1, you will get a 0 value as the return anyway.

precision = precision_score(y_test, y_pred, zero_division=0)
print('Precision score: {0:0.2f}'.format(precision))

recall = recall_score(y_test, y_pred, zero_division=0)
print('Recall score: {0:0.2f}'.format(recall))

f1 = f1_score(y_test, y_pred, zero_division=0)
print('f1 score: {0:0.2f}'.format(recall))


