adfa, p. 1, 2011.
© Springer-Verlag Berlin Heidelberg 2011
Cloud-based Object Recognition for Robots
Daniel LORENCIK1, Jaroslav ONDO1, Peter SINCAK1, Hiroaki WAGATSUMA2
1Department of Cybernetics and Artificial Intelligence, Technical University of Kosice
{daniel.lorencik, jaroslav.ondo, peter.sincak}@tuke.sk
2Department of Human Intelligence Systems, Kyushu Institute of Technology,
Kitakyushu, Japan
Abstract. Paper deals with Cloud-based Robotics approach which seems to be
very supported by new technologies in the area of Cloud computing. In this paper,
we will present and early implementation of a system for cloud-based object recog-
nition. The primary use of the system is to provide an object recognition as a ser-
vice for a wide range of devices. The main benefit of using the cloud as a platform
are easy scalability in the future and mainly the sharing of already collected
knowledge between all devices using this system. The system consist of feature
extraction part and the classification part. For feature extraction, SIFT and SURF
are used, and for the classification, the MF ArtMap has been used. In this paper,
the implementation of both parts will be presented in more detail, as well as pre-
liminary results. We do assume that Cloud Robotics and Brain research for Robots
will emerge into a functional system able to share and utilize common knowledge
and also personalization in close future.
Keywords: cloud computing, cloud robotics, SIFT, SURF, MF ArtMap, Brain
like systems
1 Introduction
Cloud Computing was introduced for IT domain many years ago. The impact to Intelli-
gent Robotics came only recently when a concept of Cloud Robotics came into the do-
main of Intelligent Robotics [1], [2]. We do believe that Cloud Robotics should include
implementation of Artificial Intelligence on the Cloud and also this technology can bring
some major changes in core Artificial Intelligence like pattern Recognition towards con-
tinuously changing representation set for learning. Learning approach seems to be incre-
mental and also some brain like inspirations can play important role of the resulting sys-
tem. Crowdsourcing and also multisource information about brain functioning can bring
effect in resulting accuracy of Robotic Intelligence.
2 Cloud Based Framework for Cloud Robotics
We have discussed the system proposal in greater detail in [3]. The proposed system
is based on the notion of AI Brick [4] – to provide well-defined system suited for one
task – in this case the object recognition. Since the system uses Microsoft Azure as a
cloud platform, the inherited cloud capabilities will allow for easy scalability in case of
increased demand on service, will allow for easy deployment of new versions (as the
main logic will be provided as a cloud service) and most importantly, it will allow for
knowledge acquisition and sharing from all of the connected clients.
The important feature of the system is that it places no special requirements for the
devices that would use it. The only requirements are ability to capture images and to
send them over the internet connection to the service.
2.1 Cloud Computing Platform and Technological Aspects
As was already mentioned, the system is based on the PaaS [5] (Platform as a Service)
provided by Microsoft Azure. Since our system is intended to be a cloud service, we
adopted the modular architecture of Azure cloud services, where user interfaces are cre-
ated as web roles hosted on virtual computers of variable computing power with the use
of ASP.NET, and the background jobs are created as worker roles hosted on dedicated
virtual servers of variable computing power. These are interacting with the use of Mes-
sage Bus and Queues.
The image data are stored as a blob storage, as well as the descriptors extracted from
them.
To create a truly cloud based service, we use the No-SQL Azure Tables [6] for cross-
referencing the image data, extracted descriptors and the classification data instead of
SQL-like databases.
From the high level architecture proposal on the Fig. 1 it can be seen that only the
image is sent over the Internet to the cloud service as an input data. The required prepro-
cessing and feature extraction is done on the cloud. This approach certainly creates a
problem in the terms of the speed, as the upload of the image is a time consuming oper-
ation.
However, it is necessary to achieve the normalized feature space required for the ob-
ject classification and also makes the resultant service more widely available, as we do
not require any special software of the device for the communication. In the final stage
of the service development, we will implement REST-like API for the use in other sce-
narios (in line with the AI-brick notion).
Such a service can then be utilized in many applications, most notable are the appli-
cations of the cloud robotics. An example can be the RoboEarth project [7] which is able
to use existing cloud image recognition services like Google Goggles [8].
Fig. 1. High level architecture of the proposed system [3]
2.2 Research Approaches used in Proposal
The image processing is important part of information acquisition for Robots. In im-
age processing a feature space can be used in many forms. We have chosen spectral and
also derived descriptors as features for pattern recognition procedure. We are using the
SIFT (Scale invariant feature transform) [9] ad SURF (Speeded-Up Robust Features)
[10] for features extraction and Membership function ArtMap [11]–[13] and Gaussian
classifier for the classification of objects. The main research goal of our work is to adapt
these approaches to the cloud environment and to find out which combination of extrac-
tor-classifier provides the best results.
We had chosen these two classifiers as the MF ArtMap represents the model-free
classifier, whereas the Gaussian represents the model-dependent classifier. One of our
research goal is to compare these two methods.
We anticipate the challenge with the adaptation of the classifier methods to the cloud
environment. The goal of the proposed system is to provide stable service for all devices
connected without regard to the actual number of connected devices. In other words, the
service has to be scalable. In the terms of cloud computing that means the virtual ma-
chines which are the underlying infrastructure of the service can be at any time rebooted,
shut down or started. Therefore the system itself has to be built in a way that reflects
these conditions.
We also compare these classification methods to the simple matching, which can
prove faster in certain conditions (up to certain number of entries in the table storage).
Another anticipated challenge is how to work effectively with the large sets of data
we assume we will amass during the course of experiments and eventual publication of
the service for the public use.
As one of the classifier, and the one to be used in the proof of concept experiment, we
had considered the use of one from the group of ART neural networks due to the previous
experience, more precisely ArtMap [14], [15] neural network subgroup. These networks
are able to be trained using supervised learning. Finally, MF (membership function) Art-
Map ([13], [16]) neural network was chosen as a classifier. This type of neural network
combines theory of fuzzy sets and ART theory. The consequence of this combination is
structured output consisting of computed values of the membership function of every
found fuzzy cluster of every known class for the input. This way, it is possible to com-
pute how much the input belongs into every class. The input is classified into the class
represented by winner fuzzy cluster. Winner fuzzy cluster is cluster belonging into the
output vector, however the value of its membership function is maximal in the output
vector.
3 Cloud-based Image Classification – Software as a Service
We had divided the service into two parts, one called Cloud-based Feature Extrac-
tion (CFE) and the second the Cloud-based Classification - CCL. In this section we will
talk about the feature extraction part which we had already implemented as a service. In
the following text, we will use the abbreviation CFE instead of the Cloud-based feature
extractor for describing the service.
Fig. 2. CFE architectures overview. On the left (a) is architecture version 1, on the right (b) ar-
chitecture version 2
3.1 CFE Architecture Version 1
Our first architecture design was to use dedicated roles for extraction and for image
preprocessing. The idea was that the image preprocessing was the same for both of the
extractors and seemed only fitting to have it scaled automatically based on the actual
load. For each of the extractors, separate worker roles were created – for the same reason.
The communication with the user was done through another web role.
The inter-roles communication was implemented with the Azure Queues, as com-
pared to the Service Bus Queue have less overhead and are faster. In the queue message,
we are sending the unique identification of the image.
The workflow in this architecture was as follows:
1. The user uploads the image via the web page, and chooses the extractor (SIFT, SURF
or both of them)
2. The image is stored to the blob storage, and the unique Id of the image is put into the
queue for image preprocessing
3. The image preprocessing role accesses the image, and rewrites it with normalized
image (scaled down if too big, and set to the shades of gray). Also, the Id of the pre-
processed image is put into the queues for selected extractor services
4. The extractor role access the image in the storage by unique Id, and extract local fea-
tures, which are then stored in blob storage with extractor prefix and image Id. The
image and its extracted features are also written to the Azure Table, in which the
relations between objects are kept
5. The web page with result table is updated and shows the uploaded preprocessed image
along with the extracted features (available as an XML formatted document)
The schema of the architecture can be seen on the left side of the Fig. 2.
This architecture had a drawback in the terms of speed, as can be seen in the Table 1
and Table 2.
3.2 CFE Architecture Version 2
In the second architecture design, we made changes to speed up the process of feature
extraction. As can be seen from the Table 1 and Table 2 , there is a significant time
when the service is literally doing nothing, it just waits for sleep cycle to complete to
check the queue for new messages. Since the architecture 1 used 3 queues (with one
feeding the other two through the image preprocessing role), we decided to add the im-
age processing to the extraction roles, thereby eliminating the first queue and one worker
role. This idea was supported also because the image preprocessing was the least time
consuming operation in the cycle.
By the elimination of one worker role, the workflow in architecture 2 changed:
1. The user uploads the image via the web page, and chooses the extractor (SIFT, SURF
or both of them)
2. The image is stored to the blob storage, and the unique Id of the image is put into the
queue for selected extractor
3. The extractor role access the image in the storage by unique Id, and extract local fea-
tures, which are then stored in blob storage with extractor prefix and image Id. The
image and its extracted features are also written to the Azure Table, in which the
relations between objects are kept
4. The web page with result table is updated and shows the uploaded preprocessed image
along with the extracted features (available as an XML formatted document)
The schema of the architecture can be seen on the right side of the Fig. 2.
This architecture was quicker than the first. The results of measurements can be seen
in the tables Table 3 and Table 4.The speed-up is between 18 and 32%. Currently we
are optimizing the code to further speed-up the extraction process.
3.3 Measured Speed Results for the CFE Architectures
For testing, we used 20 images of varying size and complexity, smaller one with resolu-
tion 0.16 MPX (mega pixels) and the biggest one had 10.84 MPx. Five of the images
were above FullHD resolution. The batch of images can be considered small, but at this
stage, we use it only for validation of the design and the rough speed tweaking of the
service. After deployment, the testing will be more rigorous with bigger sample size.
We also measured the cloud service run in local emulator, so we can compare these
two environments. But even in local emulator, we were using live cloud storage (unem-
ulated), therefore only the roles were run locally.
The infrastructure we used were Small compute instances for all roles, and the sleep
cycle for worker roles was set to 2 seconds. We will also experiment with these settings
in later stages of research.
In the following tables, the measured values of time taken by the service are shown.
The “Time for the user” column shows the time between clicking the upload button and
showing the result on the page. The “Sum of time taken by tasks” column shows the sum
of time actually consumed by the roles to compute result. The last two rows shows the
time for extracting the local features and storing them in storage.
Table 1. Measurements of the CFE architecture 1 - speed on the local emulator
Time for
user
Sum of time
taken by tasks [s]
SIFT extraction
[ms]
SURF extraction
[ms]
min 0:00:02 2.0450 435.1242 710.4513
max 0:04:39 21.4839 8196.7643 12731.7996
Average 0:00:20 5.0472 1860.4650 2287.6808
Median 0:00:05 3.3764 896.0543 1229.4251
Table 2. Measurements of the CFE architecture 1 - speed on cloud environment
Time for
user
Sum of time
taken by tasks [s]
SIFT extraction
[ms]
SURF extraction
[ms]
min 0:00:01 0.9759 197.9281 354.9850
max 0:00:15 11.9751 5967.6276 8374.6194
Average 0:00:04 2.6114 1007.3908 1690.6716
Median 0:00:03 1.7334 473.1524 1074.7474
Table 3. Measurements of the CFE architecture 2 - speed on the local emulator
Time for
user
Sum of time taken
by tasks [s]
SIFT extraction
[ms]
SURF extraction
[ms]
min 0:00:00 1.3733 170.0301 211.0119
max 0:00:10 12.2926 3121.1854 5058.4078
Average 0:00:03 3.0056 632.8390 942.6149
Median 0:00:02 2.3446 369.7710 586.2928
Table 4. Measurements of the CFE architecture 2 - speed on cloud environment
Time for
user
Sum of time taken
by tasks [s]
SIFT extraction
[ms]
SURF extraction
[ms]
min 0:00:00 0.7403 156.2778 169.7089
max 0:00:11 10.0929 3578.1217 7811.9474
Average 0:00:03 1.9533 686.7257 1358.1596
Median 0:00:02 1.3111 349.4731 788.2077
4 Cloud-based MF ArtMap Classifier
The second part of proposed system is classifier CL – implemented as Software as a
service. . Once the image's descriptors are extracted, they are propagated into classifier.
The classifier classifies the object on the picture into one of known classes or create new
one if the object does not fit to none of known classes.
From the group of ART neural networks we chose ArtMap [14], [15] neural network
subgroup. These networks are able to be trained using supervised learning. Finally, MF
(membership function) ArtMap [13], [16] neural network was chosen as a classifier. This
type of neural network combines theory of fuzzy sets and ART theory. The consequence
of this combination is structured output consisting of computed values of the member-
ship function of every found fuzzy cluster of every known class for the input. This way,
it is possible to compute how much the input belongs into every class.
We implemented MF ArtMap neural network classifier as separated cloud service.
That makes proposed system more modular and allows combination of any classifier and
any image descriptor extractors to reach the best results. MF ArtMap neural network is
implemented like a data structure. All values of the MF ArtMap classifier and also
trained classes, relevant clusters and their settings are stored in cloud table in cloud data
store.
Fig. 3. Graphical description of training problem
During the experiments, we encountered a problem with training new images (Fig.
3). Extractor service (CFE) extract different number of descriptors for every input image.
This number depends on different factors like a size of the input image, number of de-
tected key points in the image etc. Simultaneously, the MF ArtMap neural network ex-
pects constant dimension vector as an input. Therefore, we decided to train MF ArtMap
network sequentially - every descriptor as separate input. Once all descriptors of input
image are propagated through the MF ArtMap neural network, we obtained vector of
values of all membership function of input descriptors to all clusters and all classes. At
this point we were able to statistically classify input image into one of known class or
create new class of no match was found.
Fig. 4. Modification of MF ArtMap topology for sequential input
Described solution to the problem required the modification of the MF ArtMap topol-
ogy. On the Fig. 4 the modified topology is presented. The layer called stack of winner’s
fuzzy clusters has been added. The consequence of this modification is that the output
from neural network is not just one winning fuzzy cluster determining the input class,
but the output is the set of winner fuzzy clusters. After all descriptors are propagated
through the first three layers, the content of the stack is propagated to the output layer,
where the winner fuzzy clusters are statistically evaluated and the class of the input set
of descriptors is determined.
4.1 Proof of Concept Experiment
In our experiments with MF ArtMap on the Cloud, we created architecture shown on
Fig. 5. The robot Nao is capturing the image and send it to the control application on the
computer. The Windows Form application relays this image to the cloud service for pro-
cessing. The image is then processed on the cloud, the features extracted by the CFE
service and passed to the MF ArtMap classifier. The result of the classification is then
send back to the control application on the computer, which relays the data to the robot.
After successful classification, the robot says the result class of the object on the captured
image.
Fig. 5. High-level architecture of the system used as a proof of concept
The experiments were done on two sets of images – set 1 and set 2. First set consisted
of logos and simple objects, set number 2 contained images of more complex objects.
Both sets were divided 60/40 for learning and testing phase. For comparison, we have
used different type of features, SIFT, SURF and spectral RBG features of the image. The
results of the experiments are shown in the Table 5. The basic intention was to observe
a behavior of the CFE on different types of images and there classification accuracy
could be influenced by number of features identified on those different type of images.
The number of clusters and generalization ability of the MF ArtMap classifier was also
observed and taken into consideration. The incrementally of the MF ArtMap classifica-
tion approach is very good advantage since additional classes will not require the retrain-
ing of the neural network but are just incrementally processed in the feature space.
Table 5. Proof of concept – results of the classification using two sets of data
SET 1 SET 2
SIFT SURF RGB SIFT SURF RGB
Cla
ssif
ica
tio
n p
reci
sio
n
Training set 100,0% 100,0% 90,0% 100,0% 100,0% 91,2%
Testing set 70,0% 65,0% 70,0% 65,2% 65,2% 56,5%
Representative set 85,0% 82,5% 80,0% 82,6% 82,6% 73,9%
Number of found clusters 2161 3075 798 2165 2895 681
Generalization of Neural Net 0,223 0,491 0,999 0,149 0,423 0,998
The above classification results are representing average classification rate which was
previously evaluated on the contingency tables in more details. The Number of Clusters
and Generalization is in correlation since the ideal case is to have few clusters in classi-
fication but it also depends on processing data.
5 The Cloud-based Robotics with Brain like Approaches
Our intermediate goal is to use the gained knowledge to implement an MF ArtMap as
a service. The Proof of concept presented in this paper was using the basic structure with
MF ArtMap, which was not modified for cloud infrastructure. Regarding this, the syn-
aptic weights of cloud version of MF ArtMap will have to be stored separately. This will
allow for easy duplication of trained neural network, or moving the application to more
powerful cloud server if there was demand for it. The scaling will then be done by the
platform independently of human intervention, thereby providing robustness to the ob-
ject recognition service. The MF ArtMap will need to be adapted further for the task of
object recognition using feature descriptors, as the number of descriptors varies with
object. The proof of concept used batch learning, which provided rather unsatisfying
results. Therefore, we are working on MF ArtMap input layer modification to allow for
inputting all the descriptors at once.
In close future we do believe to add to this framework some brain like approaches
mainly from repository maintained by PhysioDesigner project [17]. We believe that
implementation of hybrid approaches using selected methods of Artificial or Computational
Intelligence and Brain like and more biologically inspirared systems can lead to more accorate
results in the Cloud-based framework for Robots. The current testing platform is NAO humanoid
robot and we do expect the extend this activity to Pepper humanoid platform next year.
6 Conclusion
We have presented some results of Cloud-based system for Object Recognition use-
able for Humanoid Robot NAO. We do believe that further work on the approach can be
useful for multi robotic platform and also we do expect hybridization of the classical AI
approaches with brain like approaches for the benefit if Cloud-based robotic intelligence.
We expect problems with the standardization of databases for intelligence including fact
that domain oriented knowledge will be preferable and easy to implement versus univer-
sal knowledge. Also the learning procedure is expected to be incremental and domain
oriented and we do not think that universal learning approach will succeed in the close
future.
Acknowledgment: This paper is the result of the Project implementation: University
Science Park TECHNICOM for Innovation Applications Supported by Knowledge
Technology, ITMS: 26220220182, supported by the Research & Development Opera-
tional Programme funded by the ERDF.
References
[1] J. J. Kuffner, “Cloud-enabled robots,” in IEEE-RAS International Conference on
Humanoid Robotics, 2010.
[2] G. Mohanarajah, D. Hunziker, R. D’Andrea, and M. Waibel, “Rapyuta: A Cloud
Robotics Platform,” IEEE Trans. Autom. Sci. Eng., pp. 1–13, 2014.
[3] D. Lorenčík, M. Tarhaničová, and P. Sinčák, “Cloud-Based Object Recognition: A
System Proposal,” in Robot Intelligence Technology and Applications 2, vol. 274, J.-H.
Kim, E. T. . Matson, H. Myung, P. Xu, and F. Karray, Eds. Cham: Springer
International Publishing, 2014, pp. 707–715.
[4] T. Ferraté, “Cloud Robotics - new paradigm is near,” Robotica Educativa y Personal,
20-Jan-2013.
[5] P. Mell and T. Grance, “The NIST Definition of Cloud Computing Recommendations
of the National Institute of Standards and Technology,” Nist Spec. Publ., vol. 145, p. 7,
2011.
[6] J. Giardino, J. Haridas, and B. Calder, “How to get most out of Windows Azure
Tables.” [Online]. Available:
http://blogs.msdn.com/b/windowsazurestorage/archive/2010/11/06/how-to-get-most-
out-of-windows-azure-tables.aspx.
[7] “RoboEarth Project.” [Online]. Available: http://www.roboearth.org/. [Accessed: 20-
Mar-2014].
[8] “Google Goggles.” [Online]. Available: http://www.google.com/mobile/goggles/#text.
[Accessed: 20-Mar-2014].
[9] D. G. Lowe, “Object recognition from local scale-invariant features,” in Proceedings of
the Seventh IEEE International Conference on Computer Vision, 1999, pp. 1150–1157
vol.2.
[10] H. Bay, T. Tuytelaars, and L. Van Gool, “SURF: Speeded Up Robust Features,” in
European Conference on Computer Vision, 2006, pp. 404–417.
[11] A. Bodnárová, “The MF-ARTMAP neural network,” in Latest Trends in Applied
informatics and Computing, 2012, pp. 264–269.
[12] P. Smolár, “Object Categorization using ART Neural Networks,” Technical University
of Kosice, 2012.
[13] P. Sinčák, M. Hric, and J. Vaščák, “Membership Function-ARTMAP Neural
Networks,” TASK Q., vol. 7, no. 1, pp. 43–52, 2003.
[14] G. A. Carpenter, “Default ARTMAP,” Boston, 2003.
[15] N. Kopco, P. Sincak, and S. Kaleta, “ARTMAP Neural Networks for Multispectral
Image Classification,” J. Adv. Comput. Intell., vol. 4, no. 4, pp. 240–245, 2000.
[16] P. Sincak, M. Hric, and J. Vascak, “Neural Network Classifiers based on Membership
Function ARTMAP,” in Systematic organisation of information in fuzzy systems, P.
Melo-Pinto, H.-N. Teodorescu, and T. Fukuda, Eds. IOS Press, 2003, pp. 321–333.
[17] “PhysioDesigner.” [Online]. Available: http://physiodesigner.org/.