'How to convert bounding box (x1, y1, x2, y2) to YOLO Style (X, Y, W, H)

I'm training a YOLO model, I have the bounding boxes in this format:-

x1, y1, x2, y2 => ex (100, 100, 200, 200)

I need to convert it to YOLO format to be something like:-

X, Y, W, H => 0.436262 0.474010 0.383663 0.178218

I already calculated the center point X, Y, the height H, and the weight W. But still need a away to convert them to floating numbers as mentioned.



Solution 1:[1]

YOLO normalises the image space to run from 0 to 1 in both x and y directions. To convert between your (x, y) coordinates and yolo (u, v) coordinates you need to transform your data as u = x / XMAX and y = y / YMAX where XMAX, YMAX are the maximum coordinates for the image array you are using.

This all depends on the image arrays being oriented the same way.

Here is a C function to perform the conversion

#include <stdlib.h>
#include <stdio.h>
#include <errno.h>
#include <math.h>

struct yolo {
    float   u;
    float   v;
    };

struct yolo
convert (unsigned int x, unsigned int y, unsigned int XMAX, unsigned int YMAX)
{
    struct yolo point;

    if (XMAX && YMAX && (x <= XMAX) && (y <= YMAX))
    {
        point.u = (float)x / (float)XMAX;
        point.v = (float)y / (float)YMAX;
    }
    else
    {
        point.u = INFINITY;
        point.v = INFINITY;
        errno = ERANGE;
    }

    return point;
}/* convert */


int main()
{
    struct yolo P;

    P = convert (99, 201, 255, 324);

    printf ("Yolo coordinate = <%f, %f>\n", P.u, P.v);

    exit (EXIT_SUCCESS);
}/* main */

Solution 2:[2]

Here's code snipet in python to convert x,y coordinates to yolo format

def convert(size, box):
    dw = 1./size[0]
    dh = 1./size[1]
    x = (box[0] + box[1])/2.0
    y = (box[2] + box[3])/2.0
    w = box[1] - box[0]
    h = box[3] - box[2]
    x = x*dw
    w = w*dw
    y = y*dh
    h = h*dh
    return (x,y,w,h)

im=Image.open(img_path)
w= int(im.size[0])
h= int(im.size[1])


print(xmin, xmax, ymin, ymax) #define your x,y coordinates
b = (xmin, xmax, ymin, ymax)
bb = convert((w,h), b)

Check my sample program to convert from LabelMe annotation tool format to Yolo format https://github.com/ivder/LabelMeYoloConverter

Solution 3:[3]

for those looking for the reverse of the question (yolo format to normal bbox format)

def yolobbox2bbox(x,y,w,h):
    x1, y1 = x-w/2, y-h/2
    x2, y2 = x+w/2, y+h/2
    return x1, y1, x2, y2

Solution 4:[4]

There is a more straight-forward way to do those stuff with pybboxes. Install with,

pip install pybboxes

use it as below,

import pybboxes as pbx

voc_bbox = (100, 100, 200, 200)
W, H = 1000, 1000  # WxH of the image
pbx.convert_bbox(voc_bbox, from_type="voc", to_type="yolo", image_width=W, image_height=H)
>>> (0.15, 0.15, 0.1, 0.1)

Note that, converting to YOLO format requires the image width and height for scaling.

Solution 5:[5]

For yolo format to x1,y1, x2,y2 format

def yolobbox2bbox(x,y,w,h):
    x1 = int((x - w / 2) * dw)
    x2 = int((x + w / 2) * dw)
    y1 = int((y - h / 2) * dh)
    y2 = int((y + h / 2) * dh)

    if x1 < 0:
        x1 = 0
    if x2 > dw - 1:
        x2 = dw - 1
    if y1 < 0:
        y1 = 0
    if y2 > dh - 1:
        y2 = dh - 1

return x1, y1, x2, y2

Solution 6:[6]

There are two potential solutions. First of all you have to understand if your first bounding box is in the format of Coco or Pascal_VOC. Otherwise you can't do the right math.

Here is the formatting;

Coco Format: [x_min, y_min, width, height]
Pascal_VOC Format: [x_min, y_min, x_max, y_max]

Here are some Python Code how you can do the conversion:

Converting Coco to Yolo

# Convert Coco bb to Yolo
def coco_to_yolo(x1, y1, w, h, image_w, image_h):
    return [((2*x1 + w)/(2*image_w)) , ((2*y1 + h)/(2*image_h)), w/image_w, h/image_h]

Converting Pascal_voc to Yolo

# Convert Pascal_Voc bb to Yolo
def pascal_voc_to_yolo(x1, y1, x2, y2, image_w, image_h):
    return [((x2 + x1)/(2*image_w)), ((y2 + y1)//(2*image_h)), (x2 - x1)/image_w, (y2 - y1)/image_h]

If need additional conversions you can check my article at Medium: https://christianbernecker.medium.com/convert-bounding-boxes-from-coco-to-pascal-voc-to-yolo-and-back-660dc6178742

Solution 7:[7]

Just reading the answers I am also looking for this but find this more informative to know what happening at the backend. Form Here: Source

Assuming x/ymin and x/ymax are your bounding corners, top left and bottom right respectively. Then:

x = xmin
y = ymin
w = xmax - xmin
h = ymax - ymin

You then need to normalize these, which means give them as a proportion of the whole image, so simple divide each value by its respective size from the values above:

x = xmin / width
y = ymin / height
w = (xmax - xmin) / width
h = (ymax - ymin) / height

This assumes a top-left origin, you will have to apply a shift factor if this is not the case.

So the answer

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Matt Popovich
Solution 2 Matt Popovich
Solution 3 FarisHijazi
Solution 4 null
Solution 5 Matt Popovich
Solution 6 Matt Popovich
Solution 7 Engr Ali