YOLO-means you look only once . It is a task in computer vision where its used for detecting objects in a given picture .its a problem that involves in building methods for object detection ,object classification and object localization. The most recent variation of neural networks is called as yolo v3. There are three version of yolo firstly yolo v1 , yolov2, yolov3.first one is the general architecture and the second uses anchor boxes to improve bounding boxes. Applies single neural network, Divides images into grid cells ,Produce cells probability Predicts boxes
ARCHITECTURE Of YOLO V3 (describes the structure of layers)
53 CNNs layers( Darknet 53 )stacked with 53 more layers producing 106 layers of yolo V3,Loads 106 layers, Detections at layers:82,94,106,The essential elements are residual block , skip connection and up sampling.
Yolo frame work focus on entire image as a whole and predicts the bounding boxes and calculate the class probability to label the boxes. uThe family of yolo is very fast when compared to cnn . It is used to detect real time objects very fast . Yolo only predicts limited amount of bounding boxes to achieve the goal . The latest version is Yolo V3.
It accurately classifies using logistic classification, and it will detect multiple objects inside one single picture .we also give the bounding box coordinates for our model to get trained and then send it into a convolution network then to deep neural network and to the output layer .Each grid cell predicts B bounding boxes and confidence scores for those boxes. These confidence scores reflect how confident the model is that the box contains an object and also how accurate it thinks the box is that it predicts.
Each bounding box consists of 5 predictions: x, y, w, h, and confidence. The (x, y) coordinates represent the center of the box relative to the bounds of the grid cell. The width and height are predicted relative to the whole image. Finally the confidence prediction represents the IOU between the predicted box and any ground truth box. Each grid cell also predicts C conditional class probabilities.
These probabilities are conditioned on the grid cell containing an object. We only predict one set of class probabilities per grid cell , regardless of the number of boxes B. YOLO also predicts multiple bounding boxes per grid cell, At training time we only want one bounding box predictor to be responsible for each object. We assign one predictor to be “responsible” for predicting an object based on which prediction has the highest current IOU with the ground truth. This leads to specialization between the bounding box predictors. Each predictor gets better at predicting certain sizes, aspect ratios, or classes of object, improving overall recall.
So, in this way, we can detect objects in the images using the YOLO. With every detected object, it gives the probability as well of correct detection.