Coco dataset example objects

YOLOv5 Object Detection Model on Flo Edge One

If you have been keeping up with the advancements in AI technology, you’ve definitely heard terms like “YOLO” and “object detection”. They sure have become buzzwords in the field. So what is Yolo? “You look only once” is an advanced AI algorithm that accurately detects and classifies objects in a given image.

In this blog, we will be looking at a simple use case of Yolo on a real-time traffic dataset to briefly understand the model of the architecture and why it’s so hyped.

Before we get into that, let’s take a quick look at the Flo Edge One, a must-have in every AI and robotics engineer’s toolbox. Here are some remarkable benchmarks that make it a top competitor in edge devices!

  • Pre-installed with Ubuntu 22.04 and tools like ROS2, OpenCV, TFlite, etc.
  • Qualcomm Adreno 630 GPU.
  • 12 MP 4k camera at up to 30fps
  • Inferencing yolov5 at 47 milliseconds producing a smooth output of around 20 fps.

And many such awesome features. Check out the Flo Edge for more details and even some example models. Keep reading!

Introduction:

YOLO, or “You Only Look Once,” is an amazing algorithm loved by AI engineers because it’s all about detecting things in real-time. The latest version, YOLOv5, is even better because it’s the first of its kind built on PyTorch. That means it is part of PyTorch’s large ecosystem, making it accessible to a vast research community. It is super fast and accurate, plus, its weight files are almost 90 percent smaller than those of its predecessors which means it can run on embedded with ease! In this blog, we will be looking at a simple use case of car and truck detection and classification to understand the architecture of the model when compared to its predecessors.

Now, where are we running this model? This question brings us to something exciting for AI enthusiasts like yourself! SBCs are a must in your toolbox I’m sure. You can build so many different applications that can run independently on that tiny device. But shortage and supply issues these days make them so scarce and expensive. But guess what? We’ve got something even cooler called Flo Edge One! It’s an SBC with a built-in IMU and a 12MP camera. How awesome is that? It comes pre-installed with Ubuntu 22.04, ROS2, OpenCV, and various other tools and packages, making it the perfect choice for your robotics products. It’s powered by a Snapdragon 845 chip and has tons of cutting-edge features. And the best part? It’s affordable too! So you can explore and create without breaking the bank.

Dataset and Model:

The YOLOv5 model is trained on the Common Objects in Context (Coco) database, which contains more than 330,000 images with over 2.5 million object instances labeled with 80 different object categories. This database is widely used for object detection models and provides a diverse range of objects for the model to detect such as people, bicycles, cars, trucks, and other common objects found in daily life. Isn’t it insane how it takes 330,000 images for a machine to learn something as simple as a car while we can tell just after looking at it 2-3 times?

90+ labels in COCO 2017 train/validation browser – Image from https://cocodataset.org/#explore

The YOLOv5 model is a high-performance model with great accuracy and faster inference time. However, the accuracy of the model depends on the threshold set for the output. There can be false positives in the output, but a higher threshold can improve accuracy at the cost of detection speed. Inaccuracies may also occur when the model comes across unconventional forms of the same object. So that fancy customized Ferrari you’re looking at? Yeah, YOLO thinks it’s not worth calling it a car but that’s absolutely right cuz it’s a BEAST.

With an inference time of around 47 milliseconds and an impressive output of 21 fps, the YOLOv5 model performs well on the Flo Edge One GPU. This is above the average threshold for what is defined as a well-performing model. As mentioned earlier, the accuracy can be increased by setting a higher threshold, but this could also increase false negatives. That is why it is essential to establish a reasonable balance between accuracy and speed.

Performance Analysis:

With an inference time of around 47 milliseconds and an impressive output of 21 fps, the YOLOv5 model performs well on the Flo Edge One GPU. This is above the average threshold for what is defined as a well-performing model. As mentioned earlier, the accuracy can be increased by setting a higher threshold, but this could also increase false negatives. That is why it is essential to establish a reasonable balance between accuracy and speed.

Model Architecture:

So how does the model work? YOLO – as in “You only look once” – does not refer to the human ability to learn and recognize an object in one look. It refers to the ability of the model to accurately detect and classify all the objects in an image/frame in one shot.

Input and Output parameters:

The model takes frame after frame, from the given real-time data, as input. This is in float 32 format. Each frame is passed through the network to detect and classify all possible objects in it. When this model is run on the Flo Edge One, real-time data can be obtained from the in-built 12MP camera.

The output provided by the model is an array with 4 parameters – the position and dimensions of the bounding box that will be seen around the detected object(s), the class/label of the object(s), and the score for each bounding box, which corresponds to the model’s confidence of that detection.

This output is then rendered and seen on the screen as boxes around the objects detected in the input image/frame, along with the label and score.

Conclusion:

This YOLOv5 model performs very well in this simple object detection and classification demonstration which uses a dataset of cars and trucks moving in traffic. Since all vehicles have very standard characteristic features, the model runs with high accuracy and speed. Flo Edge One is the perfect device for such applications because it has a low-power GPU and is equipped with an onboard camera. It can easily collect and analyze real-time data, making it fit for applications like traffic analysis.

Share the Post:

Related Posts