'How to decode raw_outputs/box_encodings from Tensorflow Object detection ssd-mobilenet without nms
In order to deploy my own ssd-mobile model on android and use NNAPI
acceleration , I retrained the model without NMS post processing according to the tensorflow objection detection API
.
without NMS, the output raw_outputs/box_encodings
are encoded box location, I decode it as follows, but it does not work:
for(int j =0; j < 5; j++)
{
float sk = (float)(0.2 + (0.949 - 0.200) *j * 1.0 / 5*1.0);
float width_a = (float)(sk * Math.sqrt(aspectra[j]));
float height_a = (float)(sk * 1.0 / Math.sqrt(aspectra[j]));
for(int k = 0; k < featuresize[j] ; k++)
{
float center_x_a = (float)((k + 0.5) * 1.0/ featuresize[j]);
float center_y_a = (float)((k + 0.5) * 1.0/ featuresize[j]);
float ty = (float)(outputBox[0][i][0] / 10.);
float tx = (float)(outputBox[0][i][1] / 10.);
float th = (float)(outputBox[0][i][2] / 5.);
float tw = (float)(outputBox[0][i][3] / 5.);
float w =(float)(Math.exp(tw) * width_a);
float h = (float)(Math.exp(th) * height_a);
float y_center = ty * height_a + center_y_a;
float x_ceneter = tx * width_a + center_x_a;
float ymin = (float)((y_center - h ) / 2.);
float xmin = (float)((x_ceneter - w ) / 2.);
float ymax = (float)((y_center + h ) / 2.);
float xmax = (float)((x_ceneter + w ) / 2.);
Solution 1:[1]
In order to decode raw_outputs/box_encodings
you also need anchors
as the box_encodings are encoded with respect to anchors.
Following is my implementation of decoding raw_outputs/box_encodings
:
private float[][][] decodeBoxEncodings(final float[][][] boxEncoding, final float[][] anchor, final int numBoxes) {
final float[][][] decodedBoxes = new float[1][numBoxes][4];
for (int i = 0; i < numBoxes; ++i) {
final double ycenter = boxEncoding[0][i][0] / y_scale * anchor[i][2] + anchor[i][0];
final double xcenter = boxEncoding[0][i][1] / x_scale * anchor[i][3] + anchor[i][1];
final double half_h = 0.5 * Math.exp((boxEncoding[0][i][2] / h_scale)) * anchor[i][2];
final double half_w = 0.5 * Math.exp((boxEncoding[0][i][3] / w_scale)) * anchor[i][3];
decodedBoxes[0][i][0] = (float)(ycenter - half_h); //ymin
decodedBoxes[0][i][1] = (float)(xcenter - half_w); //xmin
decodedBoxes[0][i][2] = (float)(ycenter + half_h); //ymax
decodedBoxes[0][i][3] = (float)(xcenter + half_w); //xmax
}
return decodedBoxes;
}
This decoding technique is from TFLite detection_postprocess operation.
Edit: scale values are:
float y_scale = 10.0f;
float x_scale = 10.0f;
float h_scale = 5.0f;
float w_scale = 5.0f;
Solution 2:[2]
https://actcast.hatenablog.com/entry/2021/08/06/085134
This worked for me.
My pb model - SSD Mobilenet V1 0.75 Depth Quantized (tflite_graph.pb) Tf Version - 1.15 outputs - raw_outputs/box_encodings & raw_outputs/class_predictions
- Start from the step 3 (Create Anchor) steps in the above mentioned blog (as the model ready was ready with me, I didn't do training part)
- 4th step is not required if model ready & load our model (they have loaded nnoir_model, instead of that we can load our model)
corresponding google colabs : https://colab.research.google.com/github/Idein/tensorflow-object-detection-api-to-nnoir/blob/master/notebook/ssd_mobilenet_v1_coco_2018_01_28_to_nnoir.ipynb#scrollTo=51d0jACEslWu
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | |
Solution 2 | Elizebeth Kurian |