'How can I convert form [xmin ymin xmax ymax] to [x y width height] normalized in image?
I am building a custom vision application with Microsoft's CustomVision.ai.
I am using this tutorial.
When you tag images in object detection projects, you need to specify the region of each tagged object using normalized coordinates.
I have an XML file containing the annotations about the image, e.g. named sample_1.jpg
:
<annotation>
<filename>sample_1.jpg</filename>
<size>
<width>410</width>
<height>400</height>
<depth>3</depth>
</size>
<object>
<bndbox>
<xmin>159</xmin>
<ymin>15</ymin>
<xmax>396</xmax>
<ymax>302</ymax>
</bndbox>
</object>
</annotation>
I have to convert the bounding box coordinates from xmin,xmax,ymin,ymax to x,y,w,h coordinates normalized according to the provided tutorial.
Can anyone provide me a conversion function?
Solution 1:[1]
Assuming x/ymin and x/ymax are your bounding corners, top left and bottom right respectively. Then:
x = xmin
y = ymin
w = xmax - xmin
h = ymax - ymin
You then need to normalize these, which means give them as a proportion of the whole image, so simple divide each value by its respective size from the values above:
x = xmin / width
y = ymin / height
w = (xmax - xmin) / width
h = (ymax - ymin) / height
This assumes a top-left origin, you will have to apply a shift factor if this is not the case.
Solution 2:[2]
Here's a function that converts the values and normalizes them for the image size:
def convert(xmin, ymin, xmax, ymax, img_w, img_h):
dw = 1./(img_w)
dh = 1./(img_h)
x = (xmin + xmax)/2.0 - 1
y = (ymin + ymax)/2.0 - 1
w = xmax - xmin
h = ymax - ymin
x = x*dw
w = w*dw
y = y*dh
h = h*dh
return (x,y,w,h)
And for your example above:
my_xmin = 159
my_ymin = 15
my_xmax = 396
my_ymax = 302
my_img_w = 410
my_img_h = 400
convert(my_xmin, my_ymin, my_xmax, my_ymax, my_img_w, my_img_h)
Solution 3:[3]
There is a more straight-forward way to do those stuff with pybboxes. Install with,
pip install pybboxes
In your case,
import pybboxes as pbx
voc_bbox = (159, 15, 396, 302)
W, H = 410, 400 # WxH of the image
pbx.convert_bbox(voc_bbox, from_type="voc", to_type="coco")
>>> (159, 15, 237, 287)
Note that, converting to YOLO format requires the image width and height for scaling.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | N. Smith |
Solution 2 | mark_1985 |
Solution 3 | null |