A Practical Guide to Object Detection using the Popular YOLO Framework – Part III (with Python codes)

So far in our series of posts detailing object detection (links below), we’ve seen the various algorithms that are used, and how we can detect objects in an image and predict bounding boxes using algorithms of the R-CNN family..Intersection over Union and Non-Max Suppression Anchor Boxes Combining all the Above Ideas Implementing YOLO in Python   What is YOLO and Why is it Useful?.It takes the entire image in a single instance and predicts the bounding box coordinates and class probabilities for these boxes..YOLO first takes an input image: The framework then divides the input image into grids (say a 3 X 3 grid): Image classification and localization are applied on each grid..Suppose we have divided the image into a grid of size 3 X 3 and there are a total of 3 classes which we want the objects to be classified into..So, for each grid cell, the label y will be an eight dimensional vector: Here, pc defines whether an object is present in the grid or not (it is the probability) bx, by, bh, bw specify the bounding box if there is an object c1, c2, c3 represent the classes..So, if the object is a car, c2 will be 1 and c1 & c3 will be 0, and so on Let’s say we select the first grid from the above example: Since there is no object in this grid, pc will be zero and the y label for this grid will be: Here, ‘?’ means that it doesn’t matter what bx, by, bh, bw, c1, c2, and c3 contain as there is no object in the grid..Let’s take another grid in which we have a car (c2 = 1): Before we write the y label for this grid, it’s important to first understand how YOLO decides whether there actually is an object in the grid..In the above image, there are two objects (two cars), so YOLO will take the mid-point of these two objects and these objects will be assigned to the grid which contains the mid-point of these objects..The y label for the centre left grid with the car will be: Since there is an object in this grid, pc will be equal to 1..The Non-Max Suppression technique cleans up this up so that we get only a single detection per object..We repeat these steps until all the boxes have either been selected or compressed and we get the final bounding boxes: This is what Non-Max Suppression is all about..  Anchor Boxes We have seen that each grid can only identify one object.. More details

Leave a Reply