'Keras, TF: Do I have to label all images when adding an attribute to a mutilabel image classification model?

i have a dataset of images and built a strong image recognition model. now i want to add another label to my model. i am asking myself, if i have to label every single image in my dataset, which has the requested attribute:

simple example: lets say i have 500k images in total and i want to label all images which have a palm on it. lets imagine that around 100k images have a palm on it.

would my model be able to recognise the label palm 80%, 90% or better, if i only label around 20, 30 or 50k images with a palm on it? or do i have to label all 100k images with a palm to get acceptable performance?

from my point of view this could be interpretated in two directions:

  1. multilabel image classification model ignores all 0 labeled attributes and these wont affect model accuracy -> 20k labeled palm images would be good enough for strong performance, because the model is only interested in the attributes labeled as 1. (even if 100k labeled images would result in better performance)
  2. multilabel image classification model will get affected by 0 labeled attributes as well. if only 20k out of 100k palm images will be labeled, the model gets confused, because 80k images have a palm on it, but arent labeled as palm. result would be weak performance regarding this label. if thats the case, all 100k images have to be labeled for strong performance.

Am I right with one of the two suggestions or does multilabel image classification work different? I have a very big dataset and I have to label all my images by hand, which takes a lot of time. If my first suggestion works, I could save myself weeks of work.

I would appreciate a lot, if you share your expertise, experiences and whys!



Solution 1:[1]

The training process uses the negative cases just as much as the positive cases to learn what a palm is. So if some of the supplied negative cases actually contain a palm tree, your model will have a much harder time learning. You could try only labeling the 20k images to start to see if the result is good enough, but for the best result you should label all 100k.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 brad