But what how?You Only Look Once is an algorithm that utilizes a single convolutional network for object detection.
Unlike other object detection algorithms that sweep the image bit by bit, the algorithm takes the whole image and…reframe(s) the object detection as a single regression problem, straight from image pixels to bounding box coordinates and class probabilities.
To put it simply without diving into the nitty-gritty details, you take an image input, split it up on a SxS grid, pass it through a neural network to create bounding boxes and class predictions to determine the final detection output.
It is first trained multiple instances over an entire dataset before being tested on real-life images and video.
To calculate the bounding boxes, YOLO implements two key post-processing steps: IoU (Intersect over Union) and NMS (Non-maximum suppression).
The IoU is how well the machine’s predicting bounding box matches up with the actual object’s bounding box.
Take for example the image of the car below:carThe purple box is what the computer thinks is the car, while the red is the actual bounding box of the car.
The overlap of the two boxes gives us our IoU.
The shaded yellow is our IoUBelow is a simple example of how NMS operates.
Object detection algorithms often have the issue of over-identifying a certain object, in this case, Audrey Hepburn’s face.
Non-maximum suppression (NMS) has been widely used in several key aspects of computer vision and is an integral part of many proposed approaches in detection, might it be edge, corner or object detection [1–6].
Its necessity stems from the imperfect ability of detection algorithms to localize the concept of interest, resulting in groups of several detections near the real location.
NMS ensures we identify the optimal cell among all candidates where the face belongs.
Rather than determining that there are multiple cases of her face in the image, NMS chooses the highest probability of the boxes that are determining the same object.
An example of NMSUtilizing both IoU and NMS, it creates a prediction of the various objects in an image extremely fast.
By seeing the entire image during training and test time it implicitly encodes contextual information about classes as well as their appearances in the images it sees.
However, one drawback of YOLO is the inability to detect multiple objects that are either too close or too small, like in the example below where the groups of people under the building structures are unclassified by YOLO.
via TechnoStacksOne great example of how this technology can be implemented in real life is in automobile vision!.As a vehicle travels through a street, what it ‘sees’ is in constant flux, and by the quick YOLO algorithm, the car will be able to quickly identify the cyclist below.
With other sensors to detect how far away that cyclist, the car is able to take the necessary action to stop or avoid the cyclist or other cars or objects to avoid a collision!I hope this high-level overview of the YOLO algorithm has sparked your interest in the current state of where the technology of computer vision is.
IF you are interested in learning more about further research, please check out Joseph Redmon and his continued work on YOLO and other computer vision projects!.