'Is it possible to automatically infer the class_weight from flow_from_directory in Keras?
I have an imbalanced multi-class dataset and I want to use the class_weight
argument from fit_generator
to give weights to the classes according to the number of images of each class. I'm using ImageDataGenerator.flow_from_directory
to load the dataset from a directory.
Is it possible to directly infer the class_weight
argument from the ImageDataGenerator
object?
Solution 1:[1]
Just figured out a way of achieving this.
from collections import Counter
train_datagen = ImageDataGenerator()
train_generator = train_datagen.flow_from_directory(...)
counter = Counter(train_generator.classes)
max_val = float(max(counter.values()))
class_weights = {class_id : max_val/num_images for class_id, num_images in counter.items()}
model.fit_generator(...,
class_weight=class_weights)
train_generator.classes
is a list of classes for each image.
Counter(train_generator.classes)
creates a counter of the number of images in each class.
Note that these weights may not be good for convergence, but you can use it as a base for other type of weighting based on occurrence.
This answer was inspired by: https://github.com/fchollet/keras/issues/1875#issuecomment-273752868
Solution 2:[2]
Alternatively, you can simply do:
from sklearn.utils import class_weight
import numpy as np
class_weights = class_weight.compute_class_weight(
'balanced',
np.unique(train_generator.classes),
train_generator.classes)
You can then set (as per comment above):
model.fit_generator(..., class_weight=class_weights)
Solution 3:[3]
I tried both solutions and the sklearn.utils.class_weight
one gives better accuracy though I am not sure why. They do not both yield the same class weights.
Solution 4:[4]
As suggested in the article here, a good way to assign class weights is to use:
(1 / class_count) * (total_count/2)
Thus, slightly modifying the method suggested above by Fábio Perez:
counter = Counter(train_generator.classes)
total = float(sum(counter.values()))
class_weight = {class_id : (1/num_images)*(total)/2.0 for class_id, num_images in counter.items()}
Solution 5:[5]
The code suggested by Pasha Dembo works pretty well. However, you should transform it in a dictionary before inserting in the model_fit generator:
from sklearn.utils import class_weight import numpy as np
class_weights = class_weight.compute_class_weight(
'balanced',
np.unique(train_generator.classes),
train_generator.classes)
train_class_weights = dict(enumerate(class_weights))
model.fit_generator(..., class_weight=train_class_weights)
Alternatively, you can simply do:
from sklearn.utils import class_weight import numpy as np
class_weights = class_weight.compute_class_weight(
'balanced',
np.unique(train_generator.classes),
train_generator.classes) You can then set (as per comment above):
model.fit_generator(..., class_weight=class_weights)
Solution 6:[6]
from sklearn.utils import class_weight
import numpy as np
class_weights = dict(zip(np.unique(traingen.classes),class_weight.compute_class_weight(
class_weight = 'balanced',
classes = np.unique(traingen.classes),
y = traingen.classes)))
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | |
Solution 2 | Pasha Dembo |
Solution 3 | David Brown |
Solution 4 | Aman Agrawal |
Solution 5 | DCCoder |
Solution 6 | Soheil |