Object Detection with Voice Feedback — YOLO v3 + gTTS

API: The class prediction of the objects detected in every frame will be a string e.g..Output: We will also obtain the coordinates of the bounding box of every object detected in our frames, overlay the boxes on the objects detected and return the stream of frames as a video playback..We’re predicting classes and bounding boxes for the whole image quickly in one run of the algorithm (just one look of the image’s pixels), so that the predictions are informed by the global context in the image.Training TimeObject classes in coco.names file are indexed.C represents the class index of the object we are trying to label..This is represented by tensors: a general form of a vector, which will be fed into the model during training.[33, 0.21, 0.58, 0.23, 0.42] – sports ball[1, 0.67, 0.5, 0.5, 0.9] – myselfPrediction / Detection TimeNow we are feeding 1280 x 720 frames from our camera into YOLO at Prediction time..Dark-green box is the cell which contains the center of the object.There are 5 values in a bounding box — (bx, by, bw, bh, BC)If the center of an object (red dot) falls into a grid cell, only that grid cell (dark green cell) is responsible for detecting that object. Each bounding box has 5 values. The first 4 values bx, by, bw, bh represent the position of the box.1) Normalized using the coordinates of the top-left corner of the cell which contains the object’s center. 2) using the dimensions of the entire image.The 5th value is BC: the box confidence score.BC = Pr(Object existing in box) * IOU (Intersection Over Union).This measures how likely the box contains an object of any class and how accurate it is in predicting..BC=0 if no object exists in that box and we want BC=1 in predicting the ground truth.Fairly high IOUThere are B bounding boxes predicted in each grid cellYOLO v3 makes B=3 bounding boxes in each cell to predict that one object in the cell.There are also C conditional class probabilities in each grid cellThere are 80 conditional class probabilities — Pr(Class i | Object) per cell when we use COCO. It is the probability that the predicted object is of Class i given that there is an object in the cell.1 person 0.012 bicycle 0.004..33 sports ball 0.9..80 toothbrush 0.02In our example above, Class 33 has the highest probability and it will be used as our prediction of the object to be a sports ball.To sum up the aboveThere are S x S cells and in each of these cells there are 2 things: 1) B bounding boxes each with 5 values (bx, by, bw, bh, BC), 2) C conditional class probabilities..I also explored multi-threading, which in theory should create 1 process for processing every 30th frame and another process for the video playback.However, I am only able to produce the verbal description of objects detected in real-time on my webcam which is more important since the blind can’t see bounding boxes anyway..More to come!Link to project repoDiscuss further with me on LinkedIn or via jasonyip184@gmail.com!ReferencesYOLO object detection with OpenCV – PyImageSearchIn this tutorial, you'll learn how to use the YOLO object detector to detect objects in both images and video streams…www.pyimagesearch.comUnderstanding YOLOThis article explains the YOLO object detection architecture, from the point of view of someone who wants to implement…hackernoon.comReal-time Object Detection with YOLO, YOLOv2 and now YOLOv3You only look once (YOLO) is an object detection system targeted for real-time processing.. More details

Leave a Reply