Nowadays, every major city faces numerous challenges regarding public transport, waste management, and public security, among others. That’s why concepts such as Smart Cities, Internet of Things (IoT), 5G, and Artificial Intelligence (AI) —and how they relate to each other— have gained momentum.
Finding how to prime synergy between these fields of knowledge is key to achieving the fittest solutions to the challenges big cities impose on the community, and we think Cogniflow AI has much to contribute to the effort.
Based on the work done in Containers in Montevideo: a Multi-source Image Dataset (Laguna and Moncecchi, 2021) we decided to assess how could Cogniflow come up with a better/stronger solution to the challenge of detecting the garbage containers in an image and recognize if they are either clean or dirty.
The main goal of the work accomplished by Laguna and Moncecchi was to create an image dataset composed of pictures of garbage containers located in Montevideo, the capital city of Uruguay, a small South American country.
With this dataset, Object Detection (OD) models were trained to detect containers in the images, and classifiers were also trained to decide if a container was either clean or dirty, so as to automatically ask for maintenance. Nowadays the neighbors, the montevideanos, can send a picture and the location of the garbage container to a WhatsApp number so the municipality can receive the information and take action. At the moment this is done manually but could be easily automated by having a trained AI model capable of recognizing if the container is either clean or dirty, as we are going to see next. Moreover, the CCTV cameras many cities have already deployed could be used to monitor the garbage containers’ status as a preventive measure and accelerate the maintenance process, before any citizen has to actually ask for it. This could positively impact the efficiency of the whole garbage collection system and, of course, the city's cleanliness.
The pictures were collected from different sources, such as Google Street View, contributions by individual persons, social networks, and from pormibarrio.uy. The photos were taken from different perspectives and camera types, including also images taken from a moving vehicle, and the same containers could appear more than once, from many perspectives, or on different days. The total amount of pictures collected summed up to 3352, distributed as shown in the following table:
Some example images are shown below:
After the CDCM dataset was completed, the labeling process started for the two tasks at hand: OD and image recognition. The first one led to a label with a bounding box for each container encountered in an image, while the second consisted in locating each picture in its respective folder, being that clean or dirty (i.e. the two categories to train the classifier). An example of four labeled images for the OD task is shown below.
Despite the main goal of the work done by Laguna and Moncecchi was to generate a dataset, some research was made in order to quickly explore how useful this dataset could be in tasks such as OD and image recognition. Following next the results achieved are shown.
For this task, a YOLOv3 architecture was used, trained for thirty epochs, with default parameters. The model built achieved a mean average precision (mAP@0.5) of 56% in the test set.
A binary classifier was trained to recognize if a garbage container was either clean or dirty. This baseline model consisted in a MobileNetV2 architecture, implemented with Keras. Weights were used just as feature extractors and a fully connected layer with 1024 neurons was stacked on top of it, with ReLU as the activation function.
The Dropout mechanism was implemented with a probability of 0.5. The output layer consisted of a sigmoid-activated single neuron. Data augmentation was applied for training and testing, and early stopping was activated to avoid overfitting. With this setup, an accuracy of 82.51% was achieved in the test set.
The experiments shown below were run with exactly the same data —and dataset partition— of the previous ones already detailed, for the sake of being capable of correctly compare between every result obtained.
With Cogniflow training, an image recognition model is really easygoing. Just zip the ‘clean’ and ‘dirty’ images inside their respective train and validation folders, upload them, and let Cogniflow do its magic 🪄.
Note: To see more details about how to create an image classification solution in Cogniflow, just click here, go to “How to create an AI-based solution for content moderation using Cogniflow?” and follow the steps.
Results achieved with Cogniflow were notoriously superior, reaching accuracy and an f1-score of 90% (an 8% improvement), with a MobilenetV2 vectorizer and a Feed-Forward Neural Network (FFNN) classifier on the top. In the image below, the top four trained models are shown:
The best vectorizers and classifiers were the same as the ones selected by Laguna and Moncecchi. It was Cogniflow capability of selecting the best hyper-parameters and performing enriching data augmentations which enabled scoring almost an 8% enhancement in the accuracy. It’s worth mentioning that all this is done automatically by the tool, saving valuable time (i.e. money) and preventing the user to follow misleading model design paths and poorer data pre-processing techniques.
Before starting with the training, the data was converted from Pascal VOC annotation format to a YOLOv5 one. Afterward, an OD model* was trained using Cogniflow, with a default parameters setup, for one hundred epochs, which early stopped at seventy-two, achieving a mAP@0.5 of 98.5%, a 43% improvement.
*Note: the OD task is available at Cogniflow for Professional plans and beyond. See more here.
The use case described in this post is a good example of how AI can be teamed up with IoT —and soon with 5G, with more networks rolled out— to develop end-to-end, efficient Smart Cities solutions, thanks to the easy use of Cogniflow. No line of code was written in order to achieve outstanding results, no numerous, work-intensive iterations were needed and no special infrastructure had to be set up.
By leveraging the power of state-of-the-art Object Detectors and meticulously studied default parameter settings for each kind of dataset, Cogniflow can attain superior results, again, without any exhaustive effort. Just upload your data and let it do the heavy lifting.
Cogniflow accomplished an 8% improvement in the Image Recognition task and an outstanding-breathtaking 43% in the Object Detection one.
The ways Cogniflow could help solve challenges framed in the Smart Cities field are almost infinite, in areas such as image, audio (e.g. recognizing environmental sounds), and natural language processing (e.g. a Question Answering tool for tourists). Also, the work shown here could be easily adapted to also detect damaged or vandalized garbage containers, a problem which regrettably is not unknown to every major city.
Cogniflow is not only a tool to build end-to-end solutions, save time, and money and efficiently reallocate human talent for more abstract tasks, it can be key to the most important goal: improve citizens’ quality of life.