What is the YOLO algorithm? Introduction to Real-Time Object Detection

Watch the video and support me on YouTube!

YOLO, Also Known as You Only Look Once is one of the most powerful real-time object detector algorithms. It is called that way because unlike previous object detector algorithms, like R-CNN or its upgrade Faster R-CNN it only needs the image (or video) to pass one time through its network.

https://www.pyimagesearch.com/2018/11/12/yolo-object-detection-with-opencv/

These old methods were successively examining several regions of the image to find the objects present in it. YOLO changed that by reasoning at the level of the overall image. To do so, YOLO uses a unique neural network using the characteristics of the entire image to predict multiple boxes, each containing a specific object. All this simultaneously.

https://www.pyimagesearch.com/2018/11/12/yolo-object-detection-with-opencv/

To achieve this, the image is divided into ‘S’ x ‘S’ region. Then, if the center of an object is in one of these regions, the region in question is responsible for detecting the object. Each of the cells in this grid is responsible for predicting ‘B’ boxes all containing an object as well as a score representing the level of confidence for the object present in the box. If there are no objects in the cell, this score should be zero. Otherwise, if an object is in the cell, the score will be equal to the intersection over union (IoU) between the predicted box and the ground truth of the image.

https://stackoverflow.com/questions/50575301/yolo-object-detection-how-does-the-algorithm-predict-bounding-boxes-larger-than

Then, we need the class-specific confidence scores for each box which is done using a convolutional neural network based on the GoogLeNet network. The output of this algorithm will be the image (or video), sent as the input, with the objects localized and the class attached to it.

https://blog.paperspace.com/how-to-implement-a-yolo-object-detector-in-pytorch/

As previously discussed, YOLO reasons at the level of the overall picture, rather than examining successively several regions.
This allows a huge increase in detection speed but causes a small decrease in the accuracy of object detection compared to the other detection methods seen previously. It is actually one of the most powerful and used object detector algorithms right now in multiple fields like autonomous vehicles, poker cheat detection, and more.

If you want to learn more about this algorithm, check out the paper linked below!

References

Original YOLO paper: https://arxiv.org/abs/1506.02640

YOLOv4 paper: https://arxiv.org/abs/2004.10934

YOLOv4 code: https://github.com/AlexeyAB/darknet