Let’s take a look at a deep learning algorithm called semantic segmentation today. It is technically an extension to object detection where instead of bounding the rough region containing the object in a box, it classifies each pixel in that region as part of the object or not part of it. This essentially creates segments of the entire image like, background, person, car, etc. Still a little cloudy? Read ahead!
Before we get into that, let’s take a quick look at the Flo Edge One, a must-have in every AI and robotics engineer’s toolbox. Here are some remarkable benchmarks that make it a top competitor in edge devices!
- Pre-installed with Ubuntu 22.04 and tools like ROS2, OpenCV, TFlite, etc.
- Qualcomm Adreno 630 GPU.
- 12 MP 4k camera at up to 30fps
- Inferencing DeepLabV3 with a mobileNet base at 50 milliseconds producing a smooth output of around 10 fps.
Introduction:
As discussed earlier, Semantic segmentation is a deep learning model that segments a given image into different regions by classifying each pixel of the image.
It has incredibly versatile use cases like image manipulation, 3D modelling, RoI detection, and many more. It produces a dense pixel-wise segmentation map that captures all the different regions in an image. In this blog we’re looking at a DeepLab V3 model that uses MobileNet as the base architecture. DeepLab v3 is quite popular for segmentation problem statement and the MobileNet architecture makes it perfectly lightweight. Let’s look at this architecture more closely.
Dataset and Model:
DeepLab v3 is a pretrained model made available by TensorFlow. It was trained on the COCO dataset and Pascal VOC dataset which have over 100,000 images and 20 categories combined. The network backbone in this case is MobileNet which makes this model easily implementable on Android, iOS, Edge devices, and web. Even ResNet is commonly used as the network architecture instead of MobileNet but the backbone varies between the two based on the application.
The model takes an input image and gives a tensor output with the same height and width as the input along with 20 different masks. So the output size is [W,H,20]. Each of the 20 masks map pixels containing a region of interest (RoI) to its corresponding class label. All of these masks are then compressed to form a singular mask that can be overlayed on the image to depict each segment separately.
Usage:
Now let’s take a look at how this model can be run efficiently on the Flo Edge One. What is Flo Edge One you ask? This impressive device boasts a light GPU that can deliver smooth results, all while maintaining a high level of accuracy. The Flo Edge One is truly a remarkable device, providing a plethora of impressive features that make it a must-have for tech enthusiasts. With its onboard camera, inbuilt IMU, and GNSS capabilities, this device truly has it all. It comes pre-installed with Ubuntu 22.04, ROS2, OpenCV, and various other tools, making it the perfect choice for your robotics ventures. The best part? Amidst the semiconductor crisis, the Flo Edge One is affordable and ready to ship! So you can get started ASAP, without breaking the bank.
Starting out a new retail venture? Wanna give the e-commerce guys a run for their money and prove that shopping at stores can be just as hassle free? Equip your store with a virtual try on station! Run the semantic segmentation model as a base on Flo Edge One and build a manipulation model that will generate an image of the customer wearing the product in their hand. Use the Flo Edge One 12MP camera to capture the product and the customer at the virtual try on station and simply pass those images as inputs to you model. And there you have it! Efficient and easy shopping, no long trial room lines, and satisfied customers!
Performance Analysis:
The inference time of this model was around 50 milliseconds, which means the model was light enough to load quickly onto the Flo Edge CPU. Overall, the model produces an output of around 8-10 FPS. Regardless of that, the output, as seen in the video above, is quite accurate and the model manages to capture different types of vehicles at different distances from the camera really well.
While the general regions of each object were not completely off, the actual pixel-wise classification that the model aims to do is not very precise and it could be better.
Another popular version of the same model, DeepLab v3+ uses PSPNet, which is a state-of-art segmentation model, as its network backbone. This model is quite heavy to run but produces highly accurate pixel masks for any given image.
That being said, the model architecture used varies from case to case and DeepLab v3 is your best friend if you need to build an easily deployable application around your model.
Conclusion:
The semantic segmentation model yields incredibly accurate predictions when run on the Flo Edge GPU even after being compressed as a .tflite model architecture. Coupled with the 12 MP onboard camera a wide range of systems can be developed for use cases like 3D modelling, Image, manipulation, surveillance, marketing and sales.