An introduction to implementing the YOLO algorithm for multi object detection in images

An introduction to implementing the YOLO algorithm for multi object detection in imagesAnn Mohan KunnathBlockedUnblockFollowFollowingApr 1YOLO is an extremely fast real time multi object detection algorithm.

YOLO stands for “You Only Look Once”.

This is the link to the original paper : https://pjreddie.

com/media/files/papers/YOLOv3.

pdf.

The algorithm applies a neural network to an entire image.

The network divides the image into an S x S grid and comes up with bounding boxes, which are boxes drawn around images and predicted probabilities for each of these regions.

The method used to come up with these probabilities is logistic regression.

The bounding boxes are weighted by the associated probabilities.

For class prediction, independent logistic classifiers are used.

In this article, I am going to demonstrate how to implement the YOLO algorithm with a pre trained model.

First, we would need to install DarkNet.

DarkNet is a neural network framework that is open source.

You can find more information about DarkNet in this link: https://pjreddie.

com/darknet/Step 1: We import the necessary librariesimport cv2 # computer vision libraryimport matplotlib.

pyplot as plt # to plotfrom darknet import Darknet # to use DarkNetStep 2: We load the configuration file and pre trained weights into variablesconfig_file = '.

/cfg/yolov3.

cfg'pretrained_weights = '.

/weights/yolov3.

weights'Step 3: We instantiate an object of the DarkNet classnet = Darknet(config_file)Step 4: We load the pre trained weightsnet.

load_weights(pretrained_weights)Step 5: We display the network and see how it looksnet.

print_network()A small part of the output is shown below:layer filters size input output 0 conv 32 3 x 3 / 1 416 x 416 x 3 -> 416 x 416 x 32 1 conv 64 3 x 3 / 2 416 x 416 x 32 -> 208 x 208 x 64 2 conv 32 1 x 1 / 1 208 x 208 x 64 -> 208 x 208 x 32 3 conv 64 3 x 3 / 1 208 x 208 x 32 -> 208 x 208 x 64In the full output,we can see that the network has 106 layers including the classification / detection layers.

Step 6 : Next, we read in and image and display it.

This is the image we are going to apply the YOLO algorithm on.

plt.

rcParams['figure.

figsize'] = (15.

0, 15.

0)img = cv2.

imread('.

/images/city_scene.

jpg')img = cv2.

cvtColor(img, cv2.

COLOR_BGR2RGB)plt.

imshow(img)The output is as follows:Step 7: Next we load the names that the pre trained model was trained on.

class_names_file = 'data/coco.

names'class_names = load_class_names(class_names_file)Now, the variable class_names holds all the class names that the model was trained on.

Let us display the values in class_names to get a better idea of what it containsStep 8: Display the class namesprint(class_names)The output is as follows:['person', 'bicycle', 'car', 'motorbike', 'aeroplane', 'bus', 'train', 'truck', 'boat', 'traffic light', 'fire hydrant', 'stop sign', 'parking meter', 'bench','bird', 'cat', 'dog', 'horse', 'sheep', 'cow', 'elephant', 'bear', 'zebra', 'giraffe', 'backpack', 'umbrella', 'handbag', 'tie', 'suitcase', 'frisbee', 'skis', 'snowboard', 'sports ball', 'kite', 'baseball bat', 'baseball glove', 'skateboard', 'surfboard', 'tennis racket', 'bottle', 'wine glass', 'cup', 'fork', 'knife', 'spoon', 'bowl', 'banana', 'apple', 'sandwich', 'orange', 'broccoli', 'carrot', 'hot dog', 'pizza', 'donut', 'cake', 'chair', 'sofa', 'pottedplant', 'bed', 'diningtable', 'toilet', 'tvmonitor', 'laptop', 'mouse', 'remote', 'keyboard', 'cell phone', 'microwave', 'oven', 'toaster', 'sink', 'refrigerator', 'book', 'clock', 'vase', 'scissors', 'teddy bear', 'hair drier', 'toothbrush']Step 9: The image is resized to have the same width and height as the first layer of the neural networkresized_image = cv2.

resize(img, (net.

width, net.

height))resized_image = cv2.

resize(img, (net.

width, net.

height))Step 10: We set the Intersection Over Union (IOU) threshold.

IOU is defined as Area Of Overlap / Area Of Union of the ground truth bounding box and the predicted bounding box.

iou_threshold = 0.

4This means that a detection with a IOU greater than .

4 is a true positive.

Step 11: We set the Non-Maximal Suppression (NMS) threshold.

NMS suppresses overlapping bounding boxes and only retains the bounding box that has the maximum probability of object detection associated with it.

nms_threshold = 0.

6Step 12: Next, we detect the objects and display them with their probabilities.

boxes = detect_objects(net, resized_image, iou_threshold, nms_threshold)print_objects(boxes, class_names)plot_boxes(img, boxes, class_names, plot_labels = True)The output is as follows:It took 3.

684 seconds to detect the objects in the image.

Number of Objects Detected: 28 Objects Found and Confidence Level:1.

person: 0.

9999962.

person: 1.

0000003.

car: 0.

7072374.

truck: 0.

9330315.

car: 0.

6580866.

truck: 0.

6669827.

person: 1.

0000008.

traffic light: 1.

0000009.

person: 1.

00000010.

car: 0.

99736911.

bus: 0.

99802312.

person: 1.

00000013.

person: 1.

00000014.

person: 1.

00000015.

person: 1.

00000016.

person: 1.

00000017.

traffic light: 1.

00000018.

traffic light: 1.

00000019.

umbrella: 0.

99728220.

traffic light: 1.

00000021.

car: 0.

98974122.

traffic light: 1.

00000023.

traffic light: 0.

99999924.

person: 0.

99999925.

truck: 0.

71503526.

traffic light: 1.

00000027.

person: 0.

99999328.

person: 0.

999996We can see the objects that have been detected along with their probabilities in the output above.

References:The Original Paper on YOLOv3, https://pjreddie.

com/media/files/papers/YOLOv3.

pdfUdacity Computer Vision Nanodegree, https://www.

udacity.

com/course/computer-vision-nanodegree–nd891Real Time Object Detection with YOLO, YOLOv2 and now YOLOv3, https://medium.

com/@jonathan_hui/real-time-object-detection-with-yolo-yolov2-28b1b93e2088.

. More details

Leave a Reply