'How to get sample weights and class weights for multi-label classification problem?
I'm trying to build a neural network for a multi-label classification problem.
Situation
In an input image there could be multiple output classes (and they're not mutually exclusive). There are total 6 classes.
Example
Image 1 has class 1, class 2 and class 5 in it. So, the output looks like this [1, 1, 0, 0, 1, 0].
Data imbalance problem
I have total 32 unique type of images based on the combination of classes that occur in that image type. So, one type can have all the classes in it (represented by [1, 1, 1, 1, 1, 1]) whereas another type may have none of the classes in it (represented by [0, 0, 0, 0, 0, 0]).
Some images are very rare (like image containing class 1, class 3, class 4 and class 6 together) compared to other ones (like image where there is no class present). This should be clear from the data given below.
Image Type : No. of samples of that image type
[1, 0, 1, 1, 0, 1] : 1
[1, 0, 1, 0, 1, 1] : 2
[1, 1, 1, 0, 1, 1] : 2
[1, 1, 1, 1, 1, 1] : 2
[1, 0, 1, 1, 1, 1] : 3
[1, 1, 1, 1, 0, 1] : 3
[1, 0, 1, 0, 0, 1] : 3
[1, 1, 1, 0, 0, 1] : 4
[1, 1, 0, 1, 1, 1] : 4
[1, 1, 0, 1, 0, 1] : 7
[1, 1, 0, 0, 1, 1] : 7
[1, 0, 0, 1, 1, 1] : 8
[1, 0, 0, 1, 0, 1] : 16
[1, 1, 0, 0, 0, 1] : 21
[1, 0, 0, 0, 1, 1] : 28
[0, 1, 1, 0, 1, 1] : 53
[0, 1, 1, 1, 1, 1] : 63
[0, 0, 1, 1, 1, 1] : 70
[0, 0, 1, 0, 1, 1] : 78
[1, 0, 0, 0, 0, 1] : 122
[0, 1, 1, 1, 0, 1] : 141
[0, 1, 0, 1, 1, 1] : 159
[0, 1, 0, 0, 1, 1] : 239
[0, 0, 1, 1, 0, 1] : 265
[0, 1, 0, 1, 0, 1] : 283
[0, 0, 0, 1, 1, 1] : 366
[0, 1, 1, 0, 0, 1] : 491
[0, 0, 1, 0, 0, 1] : 712
[0, 1, 0, 0, 0, 1] : 1128
[0, 0, 0, 1, 0, 1] : 1183
[0, 0, 0, 0, 1, 1] : 2319
[0, 0, 0, 0, 0, 0] : 46431
Total no. of samples = 54,214 sample images
Another problem is imbalanced representation of classes. Since there are total 54214 image samples and 6 classes per sample. We get a total by multiplying these two values. 54214 * 6 = 325284
The data given below clearly shows that Class 1(present) is the least representated class. Also, we can see that negatives(0) are more compared to positives(1).
Absent(0) Present(1) Total(0 + 1)
Class 1 53981 233 54214
Class 2 52321 1893 54214
Class 3 51640 2574 54214
Class 4 51607 2607 54214
Class 5 50811 3403 54214
Class 6 46431 7783 54214
Total : 306791 + 18493 = 325284
I am using Keras and I know we can pass sample_weight and class_weight while training the model. I am using sigmoid activation in the final layer and binary_crossentropy loss since it is a multilabel classification problem.
Questions
How should I calculate the sample_weight so that I can represent rare samples (like samples of type [1, 0, 1, 1, 0, 1]) more strongly?
How should I calculate the class_weight in this situation so that the problem of more negatives(0) than positives(1) could be tackled?
[Optional/Less Important] What should I do if I want to penalize class 6 more heavily (since class 6 is most important) than other five classes?
I know it is possible to calculate it using something like scikit-learn's compute_sample_weight and compute_class_weight.
It would be really helpful if someone could provide a solution and explain it mathematically. Also, please correct me if I've understood something incorrectly.
Solution 1:[1]
I believe there are many ways to tackle this problem, but my idea would be the following:
- Have a separate model that predicts whether images are part of any class at all. This should be straightforward.
- If the image is part of any class from step 1, predict which specific classes the image is element of.
The reason why dividing the problem could be beneficial is that you can train the entire dataset on the first model, then do selective sampling in the second step in order to solve the data imbalances. You get the best of not losing any information in the first step, helping the network by simplifying the problem and tackling data imbalances in the second step.
In the second step, you could opt for:
- Six separate binary classification models representative of each class with selective sampling.
- One multi-label classification model with selective sampling.
In the first suggestion you would select the samples for each model such that the ratio between labels 0 and 1 in each model is 50/50. For example, for class 1 you would have 233 images elements of that class and 233 other arbitrarily chosen images not elements of that class. This way you have no data imbalances. If your data imbalances are actually due to sampling bias, this option makes sense.
In the second suggestion you would train only with the data that are elements of any class. This way you do have some data imbalances, but still much less than originally. If you want, you could apply more complex selective sampling by using data augmentation for specific classes to train with images of that class more often. In that case, data imbalances would reduce even more.
In the real world though, some data imbalance is actually representative. That is why I would personally go with the second suggestion.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | devidduma |