'Real time detections not happening without creating a 5 second delay

I trained a deeplearning model (EfficientnetB0) and now using OpenCV, I want to make real time predictions on the model. But I am unable to do so without creating a 5 second delay.

One reason I came up on my own is that the model architecture might be too big to compute. But then why do different real time object detection models work so efficiently even though having a dense architecture.

The code is attached here. Please look into it and suggest me how I can make progress into it.

import cv2
import time
import tensorflow as tf

# loading the model
model = tf.keras.models.load_model("./ASL-Model-1")

class_names = ['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J', 'K', 'L', 'M',
               'N', 'O', 'P', 'Q', 'R', 'S', 'T', 'U', 'V', 'W', 'X', 'Y', 'Z',
               'del', 'nothing', 'space']

def load_and_preprocess(img):
  img = tf.image.resize(img, (200, 200))

  return img

cap = cv2.VideoCapture(0)
cap.set(3, 480)
cap.set(4, 640)
cap.set(10, 130)

pTIme = 0
previous = time.time()
delta = 0
pred_show = ""
while True:
    _, img = cap.read()

    # Get the current time, increase delta and update the previous variable
    current = time.time()
    delta += current - previous
    previous = current

    # Check if 5 seconds passed
    if delta > 5:
        # Operations on image
        # Reset the time counter
        delta = 0
        img_ = load_and_preprocess(img)
        pred_prob = model.predict(tf.expand_dims(img_, axis=0))
        pred_class = class_names[pred_prob.argmax()]
        pred_prob = f"{pred_prob.max():.2f}"
        pred_show = f"Pred: {pred_class}, Prob: {pred_prob}%"

    if len(pred_show):
        cv2.putText(img, text=pred_show, org=(300, 33), fontFace=cv2.FONT_HERSHEY_PLAIN,
                    fontScale=1.8, color=(255, 0, 0), thickness=2)

    # FPS
    cTime = time.time()
    if (cTime - pTIme) != 0:
        fps = 1 / (cTime - pTIme)
        pTIme = cTime

        cv2.putText(img, f"FPS: {str(int(fps))}", org=(7, 33), fontFace=cv2.FONT_HERSHEY_PLAIN,
                    fontScale=2, color=(0, 0, 0), thickness=2)

    cv2.imshow("Webcam", img)

    if cv2.waitKey(1) & 0xFF == ord("q"):
        break

The model I trained is available here: Sign-Language-Recognition



Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source