'How to crop face detected via Mediapipe in Python

i have a problem with mediapipe coordinations. What i want to do is crop the box of the detected face.

https://google.github.io/mediapipe/solutions/face_detection.html

EXAMPLE OF PROCEDURE

And i use this code below:

    mp_face_detection = mp.solutions.face_detection
 
# Setup the face detection function.
face_detection = mp_face_detection.FaceDetection(model_selection=0, min_detection_confidence=0.5)
 
# Initialize the mediapipe drawing class.
mp_drawing = mp.solutions.drawing_utils

# Read an image from the specified path.
sample_img = cv2.imread('12345.jpg')
 
# Specify a size of the figure.
plt.figure(figsize = [10, 10])
 
# Display the sample image, also convert BGR to RGB for display. 
plt.title("Sample Image");plt.axis('off');plt.imshow(sample_img[:,:,::-1]);plt.show()

face_detection_results = face_detection.process(sample_img[:,:,::-1])
 
# Check if the face(s) in the image are found.
if face_detection_results.detections:
    
    # Iterate over the found faces.
    for face_no, face in enumerate(face_detection_results.detections):
        
        # Display the face number upon which we are iterating upon.
        print(f'FACE NUMBER: {face_no+1}')
        print('---------------------------------')
        
        # Display the face confidence.
        print(f'FACE CONFIDENCE: {round(face.score[0], 2)}')
        
        # Get the face bounding box and face key points coordinates.
        face_data = face.location_data
        
        # Display the face bounding box coordinates.
        print(f'\nFACE BOUNDING BOX:\n{face_data.relative_bounding_box}')
        
        # Iterate two times as we only want to display first two key points of each detected face.
        for i in range(2):
 
            # Display the found normalized key points.
            print(f'{mp_face_detection.FaceKeyPoint(i).name}:')
            print(f'{face_data.relative_keypoints[mp_face_detection.FaceKeyPoint(i).value]}')

So the results are in this form:

FACE NUMBER: 1

FACE CONFIDENCE: 0.89

FACE BOUNDING BOX:
xmin: 0.2784463167190552
ymin: 0.3503175973892212
width: 0.1538110375404358
height: 0.23071599006652832

RIGHT_EYE:
x: 0.3447018265724182
y: 0.4222590923309326

LEFT_EYE:
x: 0.39114508032798767
y: 0.3888365626335144

And i want to CROP the image in the coordinations of the BOX. Like

face = Image.fromarray(image).crop(face_rect)

or any other crop procedure. My problem is that i can't get the coords of the detected item from mediapipe.

Any ideas?



Solution 1:[1]

Got the solution guys

import dlib
from PIL import Image
from skimage import io
h, w, c = sample_img.shape
print('width:  ', w)
print('height: ', h)
xleft = data.xmin*w
xleft = int(xleft)
xtop = data.ymin*h
xtop = int(xtop)
xright = data.width*w + xleft
xright = int(xright)
xbottom = data.height*h + xtop
xbottom = int(xbottom)
detected_faces = [(xleft, xtop, xright, xbottom)]

for n, face_rect in enumerate(detected_faces):
    face = Image.fromarray(image_c).crop(face_rect)
    face_np = np.asarray(face)
    plt.imshow(face_np)

Solution 2:[2]

Assume, the objective is to crop a single detected face by mediapipe . Note the [0] to indicate that we are only interested in single face

results = mp_face.process(image_input)
detection=results.detections[0]

By default mediapipe returns detection data in normalize form and we have to convert to original size by multiplying x values by width and y values by height of input image.

We can employed the _normalized_to_pixel_coordinates available with the mediapipe

relative_bounding_box = location.relative_bounding_box
rect_start_point = _normalized_to_pixel_coordinates(
    relative_bounding_box.xmin, relative_bounding_box.ymin, image_cols,
    image_rows)
rect_end_point = _normalized_to_pixel_coordinates(
    relative_bounding_box.xmin + relative_bounding_box.width,
    relative_bounding_box.ymin + relative_bounding_box.height, image_cols,
    image_rows)

This essentially produce

xleft,ytop=rect_start_point
xright,ybot=rect_end_point

In other word, ytop. ybot, xleft. xright represent face_top, face_bottom, face_left, and face_right, respectively.

Since the image is simply a 3D np array, we can crop it as below

crop_img = image_input[ytop: ybot, xleft: xright]

The complete code is as below

import cv2
import mediapipe as mp
from mediapipe.python.solutions.drawing_utils import _normalized_to_pixel_coordinates



# load face detection model
mp_face = mp.solutions.face_detection.FaceDetection(
    model_selection=1, # model selection
    min_detection_confidence=0.5 # confidence threshold
)
dframe= cv2.imread('xx.png',0)
image_rows, image_cols, _ = dframe.shape
image_input = cv2.cvtColor(dframe, cv2.COLOR_BGR2RGB)
results = mp_face.process(image_input)
detection=results.detections[0]
location = detection.location_data

relative_bounding_box = location.relative_bounding_box
rect_start_point = _normalized_to_pixel_coordinates(
    relative_bounding_box.xmin, relative_bounding_box.ymin, image_cols,
    image_rows)
rect_end_point = _normalized_to_pixel_coordinates(
    relative_bounding_box.xmin + relative_bounding_box.width,
    relative_bounding_box.ymin + relative_bounding_box.height, image_cols,
    image_rows)


## Lets draw a bounding box
color = (255, 0, 0)
thickness = 2
cv2.rectangle(image_input, rect_start_point, rect_end_point, color, thickness)
xleft,ytop=rect_start_point
xright,ybot=rect_end_point

crop_img = image_input[ytop: ybot, xleft: xright]

cv2.imwrite('crop_image0.jpg', crop_img)

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 podikakos
Solution 2 rpb