'TensorFlow Object Detection API - How to train on COCO dataset and achieve same mAP as the reported one?
I'm trying to reproduce the officially reported mAP of EfficientDet D3 in the Object Detection API by training on COCO using a pretrained EfficientNet backbone. The official COCO mAP is 45.4% and yet all I can manage to achieve is around 14%. I don't need to reach the same value, but I wish to at least come close to it.
I am loading the EfficientNet B3 checkpoint pretrained on ImageNet found here, and using the config file found here. The only parameters I changed are batch size (to fit into an RTX 3090), learning rate (0.08 was yielding loss=NaN so I reduced it to 0.01), and steps, which I increased to 600k. This is my pipeline.config file:
model {
ssd {
inplace_batchnorm_update: true
freeze_batchnorm: false
num_classes: 90
add_background_class: false
box_coder {
faster_rcnn_box_coder {
y_scale: 10.0
x_scale: 10.0
height_scale: 5.0
width_scale: 5.0
}
}
matcher {
argmax_matcher {
matched_threshold: 0.5
unmatched_threshold: 0.5
ignore_thresholds: false
negatives_lower_than_unmatched: true
force_match_for_each_row: true
use_matmul_gather: true
}
}
similarity_calculator {
iou_similarity {
}
}
encode_background_as_zeros: true
anchor_generator {
multiscale_anchor_generator {
min_level: 3
max_level: 7
anchor_scale: 4.0
aspect_ratios: [1.0, 2.0, 0.5]
scales_per_octave: 3
}
}
image_resizer {
keep_aspect_ratio_resizer {
min_dimension: 896
max_dimension: 896
pad_to_max_dimension: true
}
}
box_predictor {
weight_shared_convolutional_box_predictor {
depth: 160
class_prediction_bias_init: -4.6
conv_hyperparams {
force_use_bias: true
activation: SWISH
regularizer {
l2_regularizer {
weight: 0.00004
}
}
initializer {
random_normal_initializer {
stddev: 0.01
mean: 0.0
}
}
batch_norm {
scale: true
decay: 0.99
epsilon: 0.001
}
}
num_layers_before_predictor: 4
kernel_size: 3
use_depthwise: true
}
}
feature_extractor {
type: 'ssd_efficientnet-b3_bifpn_keras'
bifpn {
min_level: 3
max_level: 7
num_iterations: 6
num_filters: 160
}
conv_hyperparams {
force_use_bias: true
activation: SWISH
regularizer {
l2_regularizer {
weight: 0.00004
}
}
initializer {
truncated_normal_initializer {
stddev: 0.03
mean: 0.0
}
}
batch_norm {
scale: true,
decay: 0.99,
epsilon: 0.001,
}
}
}
loss {
classification_loss {
weighted_sigmoid_focal {
alpha: 0.25
gamma: 1.5
}
}
localization_loss {
weighted_smooth_l1 {
}
}
classification_weight: 1.0
localization_weight: 1.0
}
normalize_loss_by_num_matches: true
normalize_loc_loss_by_codesize: true
post_processing {
batch_non_max_suppression {
score_threshold: 1e-8
iou_threshold: 0.5
max_detections_per_class: 100
max_total_detections: 100
}
score_converter: SIGMOID
}
}
}
train_config: {
fine_tune_checkpoint: "/API/Tensorflow/models/research/object_detection/test_data/efficientnet_b3/efficientnet_b3/ckpt-0"
fine_tune_checkpoint_version: V2
fine_tune_checkpoint_type: "classification"
batch_size: 2
sync_replicas: true
startup_delay_steps: 0
replicas_to_aggregate: 8
use_bfloat16: false
num_steps: 600000
data_augmentation_options {
random_horizontal_flip {
}
}
data_augmentation_options {
random_scale_crop_and_pad_to_square {
output_size: 896
scale_min: 0.1
scale_max: 2.0
}
}
optimizer {
momentum_optimizer: {
learning_rate: {
cosine_decay_learning_rate {
learning_rate_base: 1e-2
total_steps: 600000
warmup_learning_rate: .001
warmup_steps: 2500
}
}
momentum_optimizer_value: 0.9
}
use_moving_average: false
}
max_number_of_boxes: 100
unpad_groundtruth_tensors: false
}
train_input_reader: {
label_map_path: "/DATASETS/COCO/classes.pbtxt"
tf_record_input_reader {
input_path: "/DATASETS/COCO/coco_train.record-00000-of-00100"
}
}
eval_config: {
metrics_set: "coco_detection_metrics"
use_moving_averages: false
batch_size: 1;
}
eval_input_reader: {
label_map_path: "/DATASETS/COCO/classes.pbtxt"
shuffle: false
num_epochs: 1
tf_record_input_reader {
input_path: "/DATASETS/COCO/coco_val.record-00000-of-00050"
}
}
These are the results:
Solution 1:[1]
Your loss is too high. A loss around 1 indicates that your model is not being trained. It doesn’t learn the weights. There are a couple of things you can check:
- The dataset. Are all images used during training? Also have a look at the annotations. Are classes and bounding boxes correct? Or is there anything weird? For example, COCO’s bounding boxes should be given as absolute value. If there are given as relative value, this might indicate you need to rescale them.
- Is the image resized? If so, its bounding box also needs to be resized.
- Check the bounding boxes. Maybe plot a few images with their bounding boxes. If the bounding boxes are not in the correct format or its values are incorrectly scaled, you’ll see it.
- To narrow down the source of this bug, try to load weights for EfficientNet that have been trained on COCO and see what happens if you try to finetune them further with a very low lr. If that doesn’t work, then that is a very strong indication that there are problems with the annotations.
Solution 2:[2]
Two suggestions:
Batch-size
is an essential hyper-parameter in deep learning. Different batch sizes may lead to various testing and training accuracies. Choosing an optimal batch size is crucial when training a neural network.[Source]
Using a batch-size
of 1 (or 2) for a model with so many parameters may be the reason for lower accuracy.
A higher number of epochs
does not compensate for lower batch-size
.
Another point which I noticed is that the paper makes use of jitter for augmentation
.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | former_Epsilon |
Solution 2 |