'tensorflow automatic accuracy calculation for multilabel classifier

I am fitting a multilabel classifier to (train_x, train_y) while monitoring the loss and accuracy on a validation set (val_x, val_y):

classification_model.compile(optimizer=tf.keras.optimizers.Adam(learning_rate=0.0002),
              loss=tf.keras.losses.BinaryCrossentropy(from_logits=True),
              metrics=['accuracy'])
classification_model.fit(train_x, train_y, validation_data=(val_x, val_y), \
        epochs=10,
        batch_size=10
)

This gives the following output:

Epoch 1/10
50/50 [==============================] - ETA: 0s - loss: 0.1186 - accuracy: 0.7094
Epoch 1: val_loss improved from 0.15329 to 0.11998, saving model to best_classification_model.tf
50/50 [==============================] - 12s 186ms/step - loss: 0.1186 - accuracy: 0.7094 - val_loss: 0.1200 - val_accuracy: 0.6280
Epoch 2/10
50/50 [==============================] - ETA: 0s - loss: 0.0848 - accuracy: 0.7776
Epoch 2: val_loss improved from 0.11998 to 0.10281, saving model to best_classification_model.tf
50/50 [==============================] - 8s 167ms/step - loss: 0.0848 - accuracy: 0.7776 - val_loss: 0.1028 - val_accuracy: 0.7200
Epoch 3/10
50/50 [==============================] - ETA: 0s - loss: 0.0652 - accuracy: 0.8176
Epoch 3: val_loss improved from 0.10281 to 0.09259, saving model to best_classification_model.tf
50/50 [==============================] - 10s 202ms/step - loss: 0.0652 - accuracy: 0.8176 - val_loss: 0.0926 - val_accuracy: 0.7560
Epoch 4/10
50/50 [==============================] - ETA: 0s - loss: 0.0522 - accuracy: 0.8236
Epoch 4: val_loss improved from 0.09259 to 0.08710, saving model to best_classification_model.tf
50/50 [==============================] - 10s 206ms/step - loss: 0.0522 - accuracy: 0.8236 - val_loss: 0.0871 - val_accuracy: 0.7480
Epoch 5/10
50/50 [==============================] - ETA: 0s - loss: 0.0418 - accuracy: 0.8337
Epoch 5: val_loss improved from 0.08710 to 0.08441, saving model to best_classification_model.tf
50/50 [==============================] - 10s 209ms/step - loss: 0.0418 - accuracy: 0.8337 - val_loss: 0.0844 - val_accuracy: 0.7640

I am wondering how this accuracy is actually calculated. Does it count the total number of correct labels, or the total number of rows for which all labels? And what is a 'correct label'? Is (internally) the maximum taken per output row?

To clarify what I mean with each option:
The total number of correct labels: for each image, 20 labels are outputted of which some are 0 and some are 1. Report the total number of correct labels (= number of correct 0s + number of correct 1s) and divide it by the total number of labels (= 20*num_images). I don’t think this is what happens, as that would probably lead to a higher accuracy. Just predicting 0's everywhere would already give an accuracy of over 90%! And that is not what happens, even after training for a longer time.
The total number of rows for which all labels are correct: count the number of images for which all labels are correct (0s and 1s) and divide by the number of images

The model output and the validation labels look as follows

>>> classification_model.predict(val_x)    # shape: (250, 20)
array([[ -9.385,  -5.443,  -8.274, ...,   1.936, -11.607,  -1.867],
       [-10.523,   3.074,  -7.765, ...,  -2.925, -10.35 ,  -2.602],
       [ -7.872,  -7.525,  -4.877, ...,  -6.434,  -9.063,  -8.485],
       ...,
       [ -6.04 ,  -4.826,   3.537, ...,  -5.68 ,  -7.276,  -6.05 ],
       [ -5.734,  -6.399,  -5.288, ...,  -5.495,  -6.673,   0.06 ],
       [ -9.458,  -7.892,   1.391, ...,  -6.422,  -9.75 ,  -7.702]],
      dtype=float32)
>>> val_y    # also shape: (250,20)
array([[0., 0., 0., ..., 0., 0., 1.],
       [0., 1., 0., ..., 0., 0., 0.],
       [0., 0., 0., ..., 0., 0., 0.],
       ...,
       [0., 0., 1., ..., 0., 0., 0.],
       [0., 0., 0., ..., 0., 0., 1.],
       [0., 0., 1., ..., 0., 0., 0.]])


Solution 1:[1]

When you use 'accuracy', you are trusting Keras to automatically select a metric for you among BinaryAccuracy, CategoricalAccuracy, and SparseCategoricalAccruacy. You'll get burned enough times by corner cases that aren't picked up that you'll find it's easier to just be explicit. So I'd go with

metrics = [tf.keras.metrics.BinaryAccuracy(threshold=???)]

BinaryAccuracy is counted as if each label is part of one big bucket. So if you have two images, and 10 possible labels each, in multi-class classification setting, then you'll have 20 possible predictions. Binary Accuracy is just TP + TN / 20. If it makes sense to you, it is a "reduce_sum(axes=all)" rather than "(reduce_sum(reduce_mean(axis-1) == 1))".

But Keras often doesn't document these corner cases. You can read the code yourself (which gets harder if you use "accuracy" rather than instantiating an actual object). Or run experiments.

Further, since you are outputting logits not predictions (e.g. there is no logistic / sigmoid layer at the end of your model), your model will output -inf to indicate 0% confidence, zero to indicate 50% confidence ,and +inf to indicate 100% confidence. It's your decision where to place the threshold. The typical answer is 50% confidence ,but if you want to tune recall/recision/specificity, you can move that up or down. To get 50% confidence, if your model outputs logits, you should set the threshold at 0.0 because a logit of 0.0 corresponds to 50%.

>>> tf.sigmoid(0.0)
<tf.Tensor: shape=(), dtype=float32, numpy=0.5>

You do it like this.

m = tf.keras.metrics.BinaryAccuracy(threshold=0.0)

Here's an example of where not controlling the threshold properly for logits will burn you.

In [12]: import tensorflow as tf

In [13]: y_true = tf.convert_to_tensor([[0,0,0],[0,1,1]])

In [14]: y_pred = tf.convert_to_tensor([[-.1, -.1, -.1], [.1, .1, .1]])

In [15]: m = tf.keras.metrics.BinaryAccuracy(threshold=0.0)

In [16]: m(y_true, y_pred)
Out[16]: <tf.Tensor: shape=(), dtype=float32, numpy=0.8333334>

In [17]: m = tf.keras.metrics.BinaryAccuracy() # default threshhold is 0.5

In [18]: m(y_true, y_pred)
Out[18]: <tf.Tensor: shape=(), dtype=float32, numpy=0.6666667>

Sorry something this simple is a pain. Welcome to Tensorflow.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Yaoshiang