'How to get the coordinates of the bounding box in YOLO object detection?
Solution 1:[1]
A quick solution is to modify the image.c file to print out the bounding box information:
...
if(bot > im.h-1) bot = im.h-1;
// Print bounding box values
printf("Bounding Box: Left=%d, Top=%d, Right=%d, Bottom=%d\n", left, top, right, bot);
draw_box_width(im, left, top, right, bot, width, red, green, blue);
...
Solution 2:[2]
for python user in windows:
first..., do several setting jobs:
setting python path of your darknet folder in environtment path:
PYTHONPATH = 'YOUR DARKNET FOLDER'
add PYTHONPATH to Path value by add:
%PYTHONPATH%
edit file
coco.data
incfg folder
, by change thenames
folder variable to yourcoco.names
folder, in my case:names = D:/core/darknetAB/data/coco.names
with this setting, you can call darknet.py (from alexeyAB\darknet repository) as your python module from any folder.
start scripting:
from darknet import performDetect as scan #calling 'performDetect' function from darknet.py
def detect(str):
''' this script if you want only want get the coord '''
picpath = str
cfg='D:/core/darknetAB/cfg/yolov3.cfg' #change this if you want use different config
coco='D:/core/darknetAB/cfg/coco.data' #you can change this too
data='D:/core/darknetAB/yolov3.weights' #and this, can be change by you
test = scan(imagePath=picpath, thresh=0.25, configPath=cfg, weightPath=data, metaPath=coco, showImage=False, makeImageOnly=False, initOnly=False) #default format, i prefer only call the result not to produce image to get more performance
#until here you will get some data in default mode from alexeyAB, as explain in module.
#try to: help(scan), explain about the result format of process is: [(item_name, convidence_rate (x_center_image, y_center_image, width_size_box, height_size_of_box))],
#to change it with generally used form, like PIL/opencv, do like this below (still in detect function that we create):
newdata = []
if len(test) >=2:
for x in test:
item, confidence_rate, imagedata = x
x1, y1, w_size, h_size = imagedata
x_start = round(x1 - (w_size/2))
y_start = round(y1 - (h_size/2))
x_end = round(x_start + w_size)
y_end = round(y_start + h_size)
data = (item, confidence_rate, (x_start, y_start, x_end, y_end), w_size, h_size)
newdata.append(data)
elif len(test) == 1:
item, confidence_rate, imagedata = test[0]
x1, y1, w_size, h_size = imagedata
x_start = round(x1 - (w_size/2))
y_start = round(y1 - (h_size/2))
x_end = round(x_start + w_size)
y_end = round(y_start + h_size)
data = (item, confidence_rate, (x_start, y_start, x_end, y_end), w_size, h_size)
newdata.append(data)
else:
newdata = False
return newdata
How to use it:
table = 'D:/test/image/test1.jpg'
checking = detect(table)'
to get the coordinate:
if only 1 result:
x1, y1, x2, y2 = checking[2]
if many result:
for x in checking:
item = x[0]
x1, y1, x2, y2 = x[2]
print(item)
print(x1, y1, x2, y2)
Solution 3:[3]
If you are going to implement this in python
, there is this small python
wrapper that I have created in here. Follow the ReadMe
file and install it. It will be very easy to install.
After that follow this example code to know how to detect objects.
If your detection is det
top_left_x = det.bbox.x
top_left_y = det.bbox.y
width = det.bbox.w
height = det.bbox.h
If you need, you can get the midpoint by:
mid_x, mid_y = det.bbox.get_point(pyyolo.BBox.Location.MID)
Hope this helps..
Solution 4:[4]
Inspired from @Wahyu answer above. There are few changes, modification and bug fixes and tested with single object detection and multiple object detection.
# calling 'performDetect' function from darknet.py
from darknet import performDetect as scan
import math
def detect(img_path):
''' this script if you want only want get the coord '''
picpath = img_path
# change this if you want use different config
cfg = '/home/saggi/Documents/saggi/prabin/darknet/cfg/yolo-obj.cfg'
coco = '/home/saggi/Documents/saggi/prabin/darknet/obj.data' # you can change this too
# and this, can be change by you
data = '/home/saggi/Documents/saggi/prabin/darknet/backup/yolo-obj_last.weights'
test = scan(imagePath=picpath, thresh=0.25, configPath=cfg, weightPath=data, metaPath=coco, showImage=False, makeImageOnly=False,
initOnly=False) # default format, i prefer only call the result not to produce image to get more performance
# until here you will get some data in default mode from alexeyAB, as explain in module.
# try to: help(scan), explain about the result format of process is: [(item_name, convidence_rate (x_center_image, y_center_image, width_size_box, height_size_of_box))],
# to change it with generally used form, like PIL/opencv, do like this below (still in detect function that we create):
newdata = []
# For multiple Detection
if len(test) >= 2:
for x in test:
item, confidence_rate, imagedata = x
x1, y1, w_size, h_size = imagedata
x_start = round(x1 - (w_size/2))
y_start = round(y1 - (h_size/2))
x_end = round(x_start + w_size)
y_end = round(y_start + h_size)
data = (item, confidence_rate,
(x_start, y_start, x_end, y_end), (w_size, h_size))
newdata.append(data)
# For Single Detection
elif len(test) == 1:
item, confidence_rate, imagedata = test[0]
x1, y1, w_size, h_size = imagedata
x_start = round(x1 - (w_size/2))
y_start = round(y1 - (h_size/2))
x_end = round(x_start + w_size)
y_end = round(y_start + h_size)
data = (item, confidence_rate,
(x_start, y_start, x_end, y_end), (w_size, h_size))
newdata.append(data)
else:
newdata = False
return newdata
if __name__ == "__main__":
# Multiple detection image test
# table = '/home/saggi/Documents/saggi/prabin/darknet/data/26.jpg'
# Single detection image test
table = '/home/saggi/Documents/saggi/prabin/darknet/data/1.jpg'
detections = detect(table)
# Multiple detection
if len(detections) > 1:
for detection in detections:
print(' ')
print('========================================================')
print(' ')
print('All Parameter of Detection: ', detection)
print(' ')
print('========================================================')
print(' ')
print('Detected label: ', detection[0])
print(' ')
print('========================================================')
print(' ')
print('Detected object Confidence: ', detection[1])
x1, y1, x2, y2 = detection[2]
print(' ')
print('========================================================')
print(' ')
print(
'Detected object top left and bottom right cordinates (x1,y1,x2,y2): x1, y1, x2, y2')
print('x1: ', x1)
print('y1: ', y1)
print('x2: ', x2)
print('y2: ', y2)
print(' ')
print('========================================================')
print(' ')
print('Detected object width and height: ', detection[3])
b_width, b_height = detection[3]
print('Weidth of bounding box: ', math.ceil(b_width))
print('Height of bounding box: ', math.ceil(b_height))
print(' ')
print('========================================================')
# Single detection
else:
print(' ')
print('========================================================')
print(' ')
print('All Parameter of Detection: ', detections)
print(' ')
print('========================================================')
print(' ')
print('Detected label: ', detections[0][0])
print(' ')
print('========================================================')
print(' ')
print('Detected object Confidence: ', detections[0][1])
x1, y1, x2, y2 = detections[0][2]
print(' ')
print('========================================================')
print(' ')
print(
'Detected object top left and bottom right cordinates (x1,y1,x2,y2): x1, y1, x2, y2')
print('x1: ', x1)
print('y1: ', y1)
print('x2: ', x2)
print('y2: ', y2)
print(' ')
print('========================================================')
print(' ')
print('Detected object width and height: ', detections[0][3])
b_width, b_height = detections[0][3]
print('Weidth of bounding box: ', math.ceil(b_width))
print('Height of bounding box: ', math.ceil(b_height))
print(' ')
print('========================================================')
# Single detections output:
# test value [('movie_name', 0.9223029017448425, (206.79859924316406, 245.4672393798828, 384.83673095703125, 72.8630142211914))]
# Multiple detections output:
# test value [('movie_name', 0.9225175976753235, (92.47076416015625, 224.9121551513672, 147.2491912841797, 42.063255310058594)),
# ('movie_name', 0.4900225102901459, (90.5261459350586, 12.4061279296875, 182.5990447998047, 21.261077880859375))]
Solution 5:[5]
If the Accepted Answer does not work for you this might be because you are using AlexyAB's darknet model instead of pjreddie's darknet model.
You just need to go to image_opencv.cpp file in the src folder and uncomment the following section:
...
//int b_x_center = (left + right) / 2;
//int b_y_center = (top + bot) / 2;
//int b_width = right - left;
//int b_height = bot - top;
//sprintf(labelstr, "%d x %d - w: %d, h: %d", b_x_center, b_y_center, b_width, b_height);
This will print the Bbox center coordinates as well as the width and height of the Bbox. After making the changes make sure to make
the darknet again before running YOLO.
Solution 6:[6]
If you are using yolov4
in the darknet
framework (by which I mean the version compiled directly from the GitHub repo https://github.com/AlexeyAB/darknet) to run object detection on static images, something like the following command can be run at the command line to get the bounding box as relative coordinates:
.\darknet.exe detector test .\cfg\coco.data .\cfg\yolov4.cfg .\yolov4.weights -ext_output .\data\people1.jpg -out result.json
Note the above is in the syntax of Windows, so you may have to change the backward slashes into forward slashes for it to work on a macOS or Linux operating system. Also, please make sure the paths are accurate before running. In the command, the input is the people1.jpg
file in the data
directory contained in the root. The output will be stored in a file named result.json
. Feel free to modify this output name but retain the .json
extension to change its name.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | |
Solution 2 | |
Solution 3 | Ramesh-X |
Solution 4 | |
Solution 5 | Hassaan Awan |
Solution 6 | Kris Stern |