1
Smart garbage classification system
Nguyen Ngoc Bao - Nguyen Sieu High School, Ha Noi - Viet Nam
E-mail: [email protected], Phone: +84986915640
Nguyen Ngoc Le Minh - Vinschool the Harmony High School, Ha Noi - Viet Nam
E-mail: [email protected] , Phone: +84967548724
Supervisor: Dr. Le Quang Thao - VNU, University of Science, Ha Noi - Viet Nam
E-mail: [email protected], Phone: +84.983.712.941
Abstract
Pollution is a huge problem that we must face in the 21st century and it is blocking
our path to achieve sustainable development. As the human population continues to rise,
more and more garbage is being produced. It does not only seriously affect the environment
but also damages human’s health, leading to salmonella, food poisoning, fever, etc.
The project aims to present the best solution to solve the waste management problem.
In the waste management category, sorting by type of garbage is one of the most essential
steps. By using a camera, convolutional neural network (CNN), a dataset of various training
and testing pictures, detection and classification can occur with the accuracy of 99%. The
robotic arm is added to grab the garbage and bring it to the pre-defined bin. All hardware is
communicated through a wireless network. This will help mankind to improve their health -
especially garbage collection workers, increase work efficiency, and replace human labor to
achieve sustainable development.
Keywords: garbage, detection and classification, wireless network, robotic arm,
machine learning, convolutional neural network
1. Introduction
Most of the waste humanity produced is came from either the industry, medical
purposed or just from everyday life. According to Blue environment report's, Australia
releases over 50 million of core waste into the environment in every year and the number has
been dramatically increasing till now. Another statistics is that hazardous waste composed of
6.3 tonnes, or 259 kg per capita of waste and almost 60% was buried [1]. The collection of
garbage is usually manpowered. Therefore, our aim is to automatically classify the type of
garbage - in our case are bottle, nylon and scrap paper. Increase in the efficiency of collection
is also one of the most vital target that should be accomplished. Automatic trash collection
gives us many advanatages such as another source of resources, reduce the cost of labour and
money to collect and reclassify trash, etc.
The population of the world is increasing dramatically. Therefore, more and more
garbages are being released into the environment. Likewise, the garbage that is being released
is usually non-biodegradable. This has lead to the importance of classification. As garbage
2
classification takes place, it will reduce the cost of making new materials, require much less
energy and preserve the environment.
Recognising and categorizing garbage is a long-lasting problem. In many countries,
people are encouraged to find a solution that will solve this problem. In developed countries
like America or England, artificial intelligence has been used for categorizing trashes. For
example, there is a solution at America that is well known. A company for environment had
been encouraged to create a robot that help the residences to categorizing garbage by using
sensors. The robot can senses the type of garbage, remove the trash, drop it into the correct
bin and process the trash directly in the bin [2]. However, the sensors that was used might not
be the best solution because the accuracy of the sensors are low and they can easily made
mistakes. In addition to that, there are many types of garbage and sensors are unable to sort
and recognise all of them.
Artificial intelligences (AI) is a computer that human programmed with the purposes
are for automation and learn the behaviours that similar to human beings. There are many
researches that shown artificial intelligences can substitutes humans to learn and do
dangerous tasks for the human bodies including categorizing trash. In this project, we use
machine learning method to train image datasets especially convolutional neural networks.
With the breakthroughs of artificial intelligence, there are many projects of sorting
garbage using AI. For instance, Bernando S.Costa has used artficial intelligence – especially
pre-trained architecture VGG-16, AlexNet and other advanced algorithm - to classify garbage
into different types [3]. Cenk Bircanoglu have designed a project named RecycleNet. This
projects inhibits the deep neural networks to detect and classify garbages. The deep neural
networks that this group use is DenseNet121 and this gave the accuracy up to 95% [4]. With
this project, however, the light, temperature and humidity might affect enormously to the
detection result. In addition to that, these projects cannot be used widely as they are very
expensive.
Machine learning algorithms can have immense advantages such as effient working
power or high accuracy detection. In this project, SSD (Single Shot Multibox Detector) will
be discussed further. The workflow of this model is divided into 2 main parts:
SSD inhibit the convolution network on the image only once and this computes a
activation map. Next, a 3x3 kernel is then run through the map to predict the bound box and
probability.
Anchor boxes are also used at a variety of ways and learns offset at a certain distance.
After this, a bounding box is drawn with a percentage of accuracy.
2. Models and experimental methodology
Machine learning
Machine learning is one of the arising fields that show the existance of the fourth
industrial revolution. Machine learning helps solving problems such as object detection,
3
speech recognition, etc. In addition to that, machine learning also includes smaller subjects
such as neural networks, unsupervised learning, supervised learning and reinforcement
learning [5].
Data preparation
The aim of this project is to detect and classify the type of garbage. To accomplish
this target, 500 training images and 100 testing images is incorporated. In this project, four
objects is used to train the SSD model represented in Table 1. The images of the dataset have
green background and picture of garbage (Fig 1).
Figure 1. Sample images of dataset collected in the laboratory
The garbage is oriented in different conditions and propotion. Each image has a
dimension of 440 x 330 pixels, the size of the training dataset is 13.6 MB and the size of the
testing dataset is 2.72 MB.
Table 1. Information of the dataset collected
Objects Quantities
Bottle 105
Nylon 98
Scrap Paper 89
Mixture 308
The reason why a training dataset and a testing dataset were used is because the
training dataset is used to produce the feature map and the testing dataset is incorperated to
provide an unbiased evaluation of the feature map that is extracted [6].
After the pictures have been captured, a labelling process is needed. Each image
contains one or more garbage. Therefore, garbage needs to be labelled to know what is the
location of the image. Each label is a bounding box of various dimension and each of the
label only represent one class. The number of classes is counted for training process to be
undergone. The labelling software that this project follows is LabelImg (Fig 2).
4
Figure 2. Labelling image process
After the labelling process is complete, a .xml file is generated. In this .xml file, the
starting coordinate (x1,y1) and the ending coordinate (x2,y2) is present. Height and width of
the bounding box is also included. This information is needed for the server to train the image
to extract the feature map. However, the .xml file must be converted into .csv which in turns,
converted to tensorflow record .tfrecord file for training.
Convolution Neural Network architecture (CNN)
Convolution neural network is one of the multilayer neural network that is used to
detect objects. In CNN, there are 2 part: a hidden layer consists of feature extraction and the
fully connected networks for classification. The architecture of this network is shown as in
Fig 3.
Figure 3. Convolutional Neural Network architecture
Firstly, the image is converted into a 3-dimentional (3D) array. Then, a kernel is
applied to convolunate the 3D array. The convolution is done by flipping the kernel and move
it before the 3D array. Next, the kernel slides through the array and the multiplication is done
5
follows by the addition method. After the convolution is done, the results are added and
extracted to the feature layer. There are many kernels and each kernel is responsible for one
detection category such as line, edge detection. Each of this will contribute to a higher
detection form -object detection. The pooling layer will decrease the dimention of the layer
and decrease the learning time significantly. Therefore, a max pooling layer is added in each
kernel, the largest value is chosen. A bounding box is drawn around the object.
The last part is the classification network. The image is converted into a single 1D
vector and this vector will be sent to the fully connected layers. These fully connected layers
will determine the accuracy of the bounding box.
SSD model
SSD is a detection model consists of MobileNet architecture and several convolution
neural network to detect objects. In this project, SSD is an objects detection model that starts
with a MobileNet - one of CNN architecture. This is called SSD-MobileNet [7] and the
architecture is shown in Fig 4.
Figure 4. SSD-MobileNet architecture
MobileNet is a base network architecture that is pre-trained to classify objects on
huge image datasets. This architecture will convert images and export position vectors into a
matrix. MobileNet has two CNNs – depthwise convolution and pointwise convolution.
Depthwise convolution is performed filtering though the image by a single CNN while
pointwise convolution is responsible for building new features by using input channel. After
that, the feature will be imported into SSD model.
TensorFlow pre-trained models
Our project used SSD-MobileNetV2 pre-trained models which are adapted from
TensorFlow. TensorFlow is an open source library was built by Google Brain Team to train
models of neural network. TensorFlow can be used in Graphic Processing Unit (GPU) or
mobile devices such as phones or tablet. Many projects have been built by TensorFlow such
as speech recognition, information steganography, computer vision, etc.
Loss functions
6
To evaluate our models of training, loss functions are used in this project. Loss
functions are functions that determine the error between the outputs of the algorithm and the
target values. The value of this function is directly proportional to the inaccuracy of the
model. This can be calculated by comparing the training dataset and the testing dataset. In all
algorithms and especially this project, loss function must be reduced to the minimum. In
standard conditions, loss function equals to zero.
In this project, cross-entropy classification loss is used. Cross-entropy loss is a
measure based on entropy, calculating difference between two probability distribution [8].
One important information to emphasize is that this loss function judges heavily on
probability that is confident but wrong. The cross-entropy classification loss value can be
determined by the formula below:
Cross-entropy(D,L)= ���� log(��) (1)
In addition to the classification loss, regression loss is added and the type of
regression loss presented is smooth L1 [9]:
SmoothL1 (�) = �
�
���
|�| −�
�
� ��|�| < 1��ℎ������ (2)
4 Degree of Freedom robotic arm (4 DoF)
To replace human labor, a 4 DoF [10] is added to grab the garbage and put it into the
correct pre-defined bin. By using inverse kinematics, angles of movement can be determined
by using the coordinate of the bounding box after detection. After the angles have been
figured, the pulse of the motor is then calculated based on the angle value.
Robotics kinematics is the studies of motion of robotic without the help of
environmental forces. There are 2 types of kinematics, the forward kinematics and inverse
kinematics (Fig 5.)
Figure 5. Schematic diagram of inverse kinematics
Forward kinematics is often very easy and always have the solution for problem.
However, inverse kinematics is harder and require heavy computational mathematical
equations and complexity. 4 DoF schematic diagram can be represented as below:
7
Figure 6. Inverse kinematics of 4 DOF robotic arm model
To calculate the exact angle of rotation for the robotic arm to reach the object, a
system of equations is described as follow:
��� = 2(atan2�270a − √−�� − 2���� + 79668�� − �� + 79668�� − 11451456, a� + �� + 270� − 3384�)
��� = 2(����2�294� − √−�� − 2���� + 79668�� − �� + 79668�� − 11451456, �� + �� + 294� − 3384�)
where: a = (distance = -333.333 * (rect['x1'] + rect['x2']) / 2 + 396.666) - 85
b = (distance = -333.333 * (rect['x1'] + rect['x2']) / 2 + 396.666) – 80
rect[‘x1’] and rect[‘x2’] is the position of the garbage in image
With servo 3, this is the controlling servo for angle in axis z and servo 4 is for open
and close the arm. This part of robotic arm the authors will not mention because it is not the
main part in this project.
Message Queuing Telemetry Transport (MQTT)
MQTT is a lightweight and simple massaging protocol designed for constrained
devices [11]. MQTT protocol consists of 2 main parts is the server and the client. Client is
then divided into 2 parts which is the sending communication machine and information
feedback machine. In this project, the server is the 4 DoF robotic arm, and the sending
communication machine is the camera. Also, the information feedback machine is the 3 trash
bins. The reliability of MQTT is managed by 3 Quality of Service: level 0 is the massage is
sent almost one and no acknowledgement of reception is required, level 1 is the massage is
sent almost one and acknowledgement of reception is required, level 2 is a four-way
machanism is used for the delivery of massages only one [12].
Algorithm charts
8
a. diagram of training
b. diagram of testing
Figure 7. Algorithm diagram of system
In our project, we have 2 parts for recognising and sorting. The first part is when we
train the model by appling labelling images into the model (Fig 7a). The model will be
stopped when the loss function reaches 0.5. For the sorting process, the robotic arm will be
controlled to move and drop the garbage bottle, nylon or paper into the corresponding bin
(Fig 7b).
3. Results and discussions
This section shows the evaluation of this method for garbage recognition and the
analysis of loss function in detail. The results and discussions will be divided into parts:
classification loss function, smooth L1 loss in regression loss, manual control of 4 DoF, and
detection and classification results.
9
Cross-entropy in classification loss function
For the trash dataset, the reported error was high until the 1000 iteration, then started
to go down. However, at the start of the training, more and more feature of the images has
been extracted by simple filters and therefore the loss function decrease dramatically. As
more and more feature has been extracted, the object can then be identify and classifyand
because of that, the loss function decreases more slowly. The graph produces a trend
therefore it can be recognised as the machine had found a solution. The final result after over
90000 steps, the loss value is 1.15 (Fig 8).
Figure 8. Classification loss graph
Smooth L1 loss in localization loss function
Localization is a special topic of SSD because this model of recognition can recognise
large object accurately but this accuracy value decreases as the object gets smaller. As the
machine learns at a greater amount of steps, the value of localisation loss decreases. This is
because the predicting box at first mismatch a lot with the ground-truth bounding box
(labelling box) leads to low value of IoU. As more and more kernels have been used to
extract features, the object is detected more accurately and the localisation loss decreases
shown as in Fig 9.
10
Figure 9. Localization loss graph
Manual control of 4DOF robotic arm
In our project, we refer to the model EEZYbotARM_MK2 from EEZYrobots [13],
and we also modify the arm so that it is suitable for this project. The 4 DoF is created using
3D printing technology shown as in Fig 10.
a. Model 4 DoF
b. Printed 4 DoF
Figure 10. 4 DoF robot model
To control and calibrate the robotic arm, a program controlled by slider is created by
programming language Python. This slider uses QT Creator program installed in Raspberry
Pi board. For time period from 1ms – 2ms, we create a corresponding pulse controller from
500 - 2500 shown in Fig 11.
11
Figure 11. Manual control
Detection and classification
With the training models and testing models, SSD has been applied and the result of
training is shown as in Fig 12.
Figure 12. Detection and classification
The models can detect object up to 99% with 0.97 frame per second. All bounding
boxes have been drawn around the object. After testing the model, the model can recognize
garbage as the garbage has been put to the recognizing area. The final prototype of our
project was done and shown as in Fig 13.
12
Figure 13. Prototype of classify garbage system
4. Conclusion and future work
The recognition and classification of garbage based on convolution neural network for
quickly and accurately was proposed. In this project, by using SSD-MobileNetv2, the server
can identify and classify 3 type of garbage: bottle, nylon and scrap paper. All garbage has
been labelled and the position is returned to the 4 DoF. 4 DoF can pick up and return the
garbage to the correct trash bin. With SSD-MobileNetv2, the recognition and classification of
garbage have achieved a high level of accuracy but the current author desires to seek for a
better training methods.
The performence of the project can be improved by:
More and more models of training must be collected and investigated. This will
improve the performance of the classification.
The region of labelling images must be carefully labelled for more accurate training.
Because the model can recognise 3 types of garbage, more type can be added to
compare and evaluate the training model.
13
Reference
[1]. Blue Environment, “National Waste Report 2018”, Randell environmental consulting, 19
November 2018, Page 8, Category 2.1
[2]. Lori Ioannou, https://www.cnbc.com/2019/07/26/meet-the-robots-being-used-to-help-
solve-americas-recycling-crisis.html, accessed 27 July 2019
[3]. Bernando S.Costa, “Artificial Intelligence in Automated Sorting in Trash Recycling”,
[4]. Cenk Bircano, M. Atay, F.Beser “RecycleNet: Intelligent Waste Sorting Using Deep
Neural Networks”, July 2018.
[5]. Encyclopedia of Machine Learning (Page 3,4)
[6]. Frank R. Burden, Frank R. Burden, Richard G. Brereton and Peter T. Walsh, “Cross-
validatory Selection of Test and Validation Sets in Multivariate Calibration and Neural
Networks as Applied to Spectroscopy”
[7]. Cemil S., “Real-Time Diseases Detection of Grape and Grape Leaves using Faster R-
CNN and SSD MobileNet Architectures”, , April 2019
[8]. S. Panchapagesan, M. Sun, A. Khare, “Multi-task learning and Weighted Cross-entropy
for DNN-based Keyword Spotting.”, Causal Productions
[9]. Ibrahim Onaran, “Sparse spatial filter via a novel objective function minimization with
smooth l1 regularization”, Elsavier, 8 November 2012
[10]. Serdar Kucuk and Zafer Bingul “Robot Kinematics: Forward and Inverse Kinematics.”,
IntechOpen
[11]. MQTT.org, “Mq telemetry transport,” http://mqtt.org/, 2013, accessed 18/01/2020
[12]. D. Thangavel, X. Ma, A. Valera and H. Tan Sense, “Performance Evaluation of MQTT
and CoAP via a Common Middleware.”
[13]. Carlo Franciscone, http://www.eezyrobots.it/eba_mk2.html, accessed 18/01/2020.