'Why is there a difference in Intersection over Union (IoU) calculation while evaluating for same data using same model?

I evaluated the IoU score for the test dataset using the saved model.

(model.evaluate(test_gen, steps)

Also, I have calculated the IoU score for each image in the test dataset. When I calculate the average of the individual IoU values, the result differs. After a lot of trial, I couldn't understand why this is happening. I have shared the code of these two approaches below:

1st approach:

results = model.evaluate(test_gen, steps)
print("Test Loss: ",results[0])
print("Test IOU: ",results[1])
print("Test Dice Coefficent: ",results[2])
print("Test Accuracy: ",results[3])

Output:

Found 26 validated image filenames.
Found 26 validated image filenames.
2/2 [==============================] - 0s 115ms/step - loss: 0.3716 - iou: 0.5029 - dice_coef: 0.6643 - binary_accuracy: 0.9949
Test Loss:  0.37161585688591003
Test IOU:  0.5028986930847168
Test Dice Coefficent:  0.6643418669700623
Test Accuracy:  0.994917631149292

Here the final Test IoU score is 0.5028986930847168.

2nd approach:

true = []
pred = []
iou_score = []
s_iou = 0
for i in range(26):
    img = cv2.imread(test['image_path'].iloc[i])
    img = cv2.resize(img ,IMAGE_SIZE)
    img = img / 255
    img = img[np.newaxis, :, :, :]
    pred=model.predict(img)
#     print(pred)    
    mask = cv2.imread(test['mask_path'].iloc[i])
    gray = cv2.cvtColor(mask, cv2.COLOR_BGR2GRAY)
    true = cv2.resize(gray ,IMAGE_SIZE)
    true = true / 255
    true = true[np.newaxis, :, :, np.newaxis]
# print(pred.shape, true.shape)
    
    intersection_1 = np.sum(true * pred)
    sum_1 = np.sum(true + pred)
    iou_1 = (intersection_1 + 0.001) / (sum_1 - intersection_1 + 0.001)
    iou_score.append(iou_1)
    s_iou+= iou_1
    print('%.5f'%iou_1)
print('%.5f'%(s_iou/26))

Output:

0.43826
0.73197
0.67398
0.53396
0.60502
0.70793
0.70374
0.63936
0.71788
0.58394
0.29436
0.76282
0.68236
0.00000
0.00000
0.00000
0.00000
0.00000
0.00000
0.00000
0.00000
0.00000
0.00000
0.00000
0.00000
0.00000
0.31060

Here the final Test IoU score is 0.31060, which is different from the 1st approach having IoU score of 0.5028986930847168. How can I fix this issue?



Solution 1:[1]

I am facing the same issue, raised here: Dice Score for semantic segmentation when some of labels has all zeros

I hope you are using a different batch size greater than one for the generator and when you use a batch size of 1 it gives results worst than using a batch size greater than 1. As far as I understand some of your ground truths have no positive values, is it? So even if there are no TPs, when we use a Batch size of 1, it gives a penalty to the model. The solution is to use a large batch size or evaluate your model only for those instances where ground truth has at least one Positive value or do not evaluate the model against the cases when ground truth has all zeros.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Abbas Khan