'How can I convert form [xmin ymin xmax ymax] to [x y width height] normalized in image?

I am building a custom vision application with Microsoft's CustomVision.ai.

I am using this tutorial.

When you tag images in object detection projects, you need to specify the region of each tagged object using normalized coordinates.

I have an XML file containing the annotations about the image, e.g. named sample_1.jpg:

<annotation>
        <filename>sample_1.jpg</filename>
    <size>
        <width>410</width>
        <height>400</height>
        <depth>3</depth>
    </size>
    <object>
        <bndbox>
            <xmin>159</xmin>
            <ymin>15</ymin>
            <xmax>396</xmax>
            <ymax>302</ymax>
        </bndbox>
    </object>
</annotation>

I have to convert the bounding box coordinates from xmin,xmax,ymin,ymax to x,y,w,h coordinates normalized according to the provided tutorial.

Can anyone provide me a conversion function?



Solution 1:[1]

Assuming x/ymin and x/ymax are your bounding corners, top left and bottom right respectively. Then:

x = xmin
y = ymin
w = xmax - xmin
h = ymax - ymin

You then need to normalize these, which means give them as a proportion of the whole image, so simple divide each value by its respective size from the values above:

x = xmin / width
y = ymin / height
w = (xmax - xmin) / width
h = (ymax - ymin) / height

This assumes a top-left origin, you will have to apply a shift factor if this is not the case.

Solution 2:[2]

Here's a function that converts the values and normalizes them for the image size:

def convert(xmin, ymin, xmax, ymax, img_w, img_h):
    dw = 1./(img_w)
    dh = 1./(img_h)
    x = (xmin + xmax)/2.0 - 1
    y = (ymin + ymax)/2.0 - 1
    w = xmax - xmin
    h = ymax - ymin
    x = x*dw
    w = w*dw
    y = y*dh
    h = h*dh
    return (x,y,w,h)

And for your example above:

my_xmin = 159
my_ymin = 15
my_xmax = 396
my_ymax = 302
my_img_w = 410
my_img_h = 400
convert(my_xmin, my_ymin, my_xmax, my_ymax, my_img_w, my_img_h)

Solution 3:[3]

There is a more straight-forward way to do those stuff with pybboxes. Install with,

pip install pybboxes

In your case,

import pybboxes as pbx

voc_bbox = (159, 15, 396, 302)
W, H = 410, 400  # WxH of the image
pbx.convert_bbox(voc_bbox, from_type="voc", to_type="coco")
>>> (159, 15, 237, 287)

Note that, converting to YOLO format requires the image width and height for scaling.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 N. Smith
Solution 2 mark_1985
Solution 3 null