Cloud-based Control and vSLAM through Cooperative Mapping...

Cloud-based Control and vSLAM throughCooperative Mapping and Localization*

Berat A. Erol, Satish Vaishnav, Joaquin D. Labrado, Patrick Benavidez, and Mo JamshidiDepartment of Electrical and Computer Engineering

The University of Texas at San Antonio, One UTSA Circle, San Antonio, TX 78249, USAEmail: [email protected], [email protected], [email protected],

[email protected], [email protected]

Abstract—Simultaneous Localization and Mapping (SLAM) isone of the most widely popular and applied methods designedfor more accurate localization and navigation operations. Ourexperiments showed that vision based mapping helps agentsnavigate in unknown environments using feature based mappingand localization. Instead of using a classical monocular cameraas a vision source, we have decided to use RGB-D (Red, Green,Blue, Depth) cameras for better feature detection, 3D mapping,and localization. This is due to the fact that the RGB-D camerareturns depth data as well as the normal RGB data. Moreover,we have applied this method on multiple robots using the conceptof cooperative SLAM. This paper illustrates our current researchfindings and proposes a new architecture based on gathered datafrom RGB-D cameras, which are the Microsoft Kinect and theASUS Xtion Pro for 3D mapping and localization.

Index Terms—Cooperative SLAM, vSLAM, Robotics, RGB-D,Cloud Computing

I. INTRODUCTION

GPS-denied localization can be a computational bottleneckfor autonomous robotic systems navigating an unknown envi-ronment, locating objects of interest, performing automationoperations, etc. This problem requires a system to understandthe work environment, be aware the obstacles, and map theenvironment. This is commonly referred as ”the chicken andegg problem” in the literature, since calculating the locationrequires a map, and to map the area the system should know itscurrent location. Therefore, a system needs to initialize itselfbefore and during the mapping process as well as localize itselfsimultaneously. Hence, the problem defined as SimultaneousLocalization and Mapping (SLAM).

The ability to sense the environment is pivotal for anyautonomous mobile system that intends to navigate througha known or unknown area. In the past, this was done throughthe aid of sensors which is returned data that allows the mobilesystems to know where they were in the world [1]. Nowadays,there is a easier method to accomplish this task with devicessuch as Xbox Kinect. a camera provides RGB-D informationto the mobile system [2]–[4]. RGB-D sensors are now beingused to help mobile systems navigate by creating a map oftheir work environment [1], [5]–[8].

* This work was supported by Grant number FA8750-15-2-0116 from AirForce Research Laboratory and OSD through a contract with North CarolinaAgricultural and Technical State University.

Cooperative SLAM is one operational approach that aims tobuild a common map of entire work environment, and based ona shared data to calculate the locations of the system’s agentsat the same time. If SLAM operations by one system involveshigh computational complexity and high power consumption,then the level of the computational complexity and power con-sumption with multiple systems will be undeniably increased.Therefore, this calls for a solution that will act as a masterplatform which builds a general map that can be accessedby any system currently in the operation. Additionally, it willprocess all the sensor data provided by the multiple systems,or any system added for that operation later on.

The purpose of this paper is to propose a vision basedSLAM system (vSLAM) includes RGB-D cameras with cloudcomputing back-end. That is ideal for multiple unmannedground vehicles (UGV) and aerial vehicles (UAV), and co-operative work flow among them. The algorithm is discussedhere in order to better identify key features in the world thatcould be used in each agents’ own map, or one that is sharedacross multiple agents.

The complexity of the problem depends on the system andthe operation to be carried. Computation of vSLAM algorithmson a robot is typically not done as it comes with a relativelyhigh cost in computational complexity which in turn affectsthe power consumption and the positional data update rate.Storage is also an issue as large amounts of data can beacquired for mapping tasks in vSLAM. Both the storage andcomputational limits can prohibit extended use of systemsrelying on vSLAM.

The paper is organized as follows; Section II is the litera-ture survey, which is followed by Section III with proposedarchitecture. Section IV covers results of the experiments thatwere discussed, followed by the conclusions, Section V.

II. LITERATURE SURVEY

In the literature one can see several implementations oncooperative SLAM that use stereo vision camera systemsacquiring the data for fusing the cooperative data [5]. In-formation filters have suitable and convenient characteristics,especially for the multi-sensor data fusing application task.

In [6], a vision based SLAM approach is presented usingthe feedback from RGB-D camera. The features from the

color images have been extracted and projected onto thepoint cloud which was obtained from the depth images. 3Dcorrespondences were used to find correspondences betweenconsecutive images which helps in finding the transformationbetween the two images. A matrix is used to obtain the mapdata and hence localization process is done using this mapdata. In addition to these approaches and research findings,[7] presents a method of loop closing using visual featuresin conjunction with the laser scan data. The algorithm usesthe pose and time information when the image at any givenmoment matches with the previous images and the scannedpatch by the laser at those two moments are compared. Usingthis information, a covariance is obtained which providesappropriate information for loop closing.

For our experiments the implemented algorithm was a GridBased Fast SLAM by using: a single node, two nodes, fournodes, and eight nodes of a cluster. The execution time wasdecreased by several orders of magnitude with increasing num-ber of nodes. On the other hand, there were some limitationsdue to this approach, such as using a separate hardware onserver implementations, manual implementation of nodes, aswell as neglecting network delays/latencies.

A. Feature Detection and Fusion into Map Data

The most common method for feature detection in SLAMsystems has been SIFT [9], [10], ORB [10], and SURF [1].These algorithms are widely used for visual odometry, for thebuilding of 3D maps, and to detect objects or points of interest.All three of these techniques are useful in their own right forobject recognition. The only difficultly with these algorithmsis that each of them require a large amount of images to bestored. This could be handled by the cloud computing by off-loading images and running a SIFT or SURF algorithm on thecloud. This is a method that will be tested once the test bedfor the system is created.

For an encouraging approach, DAVinCi is a Cloud Com-puting Framework for Service Robots. It acts as Platform as aService (PaaS), which is designed to perform secondary taskslike mapping, localization and others [11]. Also it works asan active interface between all the heterogeneous agents, theback end computation, and storage through the use of ROSand Hadoop Distributed File System (HDFS). Its role is toact as a master node which runs the ROS name services andmaintains the list of publishers and subscribers, as well asreceive messages either from the HDFS back end or fromother agents/robots. On the other hand, it is important to statethat DaVinCi does not provide a real-time implementation.

Another method to find features in the mobile systemsenvironment is done through the use of Point Clouds and theIterative Closest Point (ICP) algorithm [12], [13]. Point cloudsallow for systems to use depth information from the RGB-Ddata stream that the sensor returns. The ICP method is meantto find the smallest distance between two points in a pointcloud. This can be done by generating a point cloud and thencomparing the individual points.

Covariance Intersection (CI) is a method of data fusionwhich combines two or more states estimates. In addition tothis, CI gathers sensor measurements from different platformshaving an unknown correlation among them. Improvements inthe location determination and mapping process is achieved byusing collaborative estimation which is implemented with aninformation filter and a CI technique. [5] shows less navigationerrors by using CI, coping with the memory and computationcost that arises from the increasing size of the augmentedstate vector matrix, and its corresponding uncertainty, whichaims to improve the robustness of Cooperative visual SLAM(vSLAM).

B. Proposed Method for Map Merging

For the aim of our experiments, the map points should betransformed from the world coordinate frame to the camera-centered coordinate frame. In other words, we project mappoints into the image plane. This is done by left-multiplicationwith a ECW matrix, which represents camera pose [14], ECW

stands for a 4x4 transformation matrix from Camera frameC, to World frame W. pjC is the pose of camera in camera-centered coordinate frame, and pjW is the pose of camera inworld coordinate frame. Then,

pjC = ECW ∗ pjW (1)

For visual observation, after tracking a video frame suc-cessfully, by using parallel tracking and mapping (PTAM),the pose estimate is scaled by the current estimate for a factorof λ∗ and it is transformed from the front camera coordinatesystem to the quadrocopters coordinate system, which leads toa direct observation of its pose given by [15], where (xt, yt, zt)denote the position of the quadcopter in meters in the worldcoordinates, and (φt, θt, ωt) denote the roll, pitch and yawangles in degrees, respectively.

hp(Xt) := (xt, yt, zt, φt, θt, ωt)T (2)

Zp,T := f(EDC .EC, t) (3)

Where, EC,t is the estimated camera pose (scaled with λ),EDC is the constant transformation from the camera to thequadcopter, and f is the transformation from the roll-pitch-yaw transformation. More details for this stage can be seen in[6], [7], [14]–[16]

For our experimental studies, two local maps, m1 and m2,need to be merged. For this purpose we used point cloud dataprocessing techniques [12], [13]. The first step is to define aglobal map where this local maps will be placed at appropriateposition with properly defined transformations. Then, sincetwo quadrocopters create two local maps, the global map Mconsists of those two local maps, along with the transformationmatrix that containing rotational and translational informationof the initial poses of the two quadrocopters. If we constructthe T1 matrix, the transformation Matrix for the first quad-copters, X01 = (x01, y01, z01, φ01, θ01, ω01) defines the initialpose of first agent in Global map M. (x01, y01, z01) defines theinitial position of the first quadcopter in X, Y and Z directions

and (φ01, θ01, ω01) is the initial orientation of it with respectto Y, X and Z axis i.e. roll, pitch and yaw. Therefore,

T1 = Tx1.Ty1.Tz1 (4)

Where,

Tx1 =

1 0 0 x010 cosθ01 −sinθ01 00 sinθ01 cosθ01 00 0 0 1

Ty1 =

cosφ01 0 −sinφ01 0

0 1 0 y01sinφ01 0 cosφ01 0

0 0 0 1

Ty1 =

cosω01 −sinω01 0 0sinω01 cosω01 0 0

0 0 1 z010 0 0 1

Similar to equation (4), T2 = Tx2.Ty2.Tz2 is the transforma-

tion matrix for the second quadrocopter, and the initial pose ofit in the Global map M is X02 = (x02, y02, z02, φ02, θ02, ω02).Tx2, Ty2 and Tz2 are transformation matrices, in similar orderas previous ones with respect to x, y and z axis, respectively.

On the other hand, the transformation matrix TM [17] is inthe form of:

TM =

[R T0 1

](5)

R defines the rotation about the three axis and T defines thetranslation with respect to the three axis. The Rotation Matrixconstitutes of the following elements:

R =

R11 R12 R13

R21 R22 R23

R31 R32 R33

Where,

R11 = cosθcosφ

R12 = sinωsinθcosφ− cosωsinφR13 = cosωsinθcosφ+ sinωsinφ

R21 = cosθsinφ

R22 = sinωsinθsinφ+ cosωcosφ

R23 = cosωsinθsinφ− sinωcosφR31 = sinθ

R32 = sinωcosθ

R33 = cosωcosθ

The roll angle can be computed as θ=-sin−1(R31). More-over, based on the two arguments, the yaw angle can be foundby ω=tan−1(R32/R33) or ω=tan−1[(R32/cosθ)/(R33/cosθ)]for cosθ < 0 or cosθ > 0, respectively. Lastly, the pitch anglecan be calculated from φ=tan−1[(R21/cosθ)/(R11/cosθ)].

Furthermore, the final equation of the global map consistingthe union of the local maps along with their transformationsis

M = T1.m1 ∪ T2.m2 (6)

III. PROPOSED ARCHITECTURE

vSLAM method will be used in order to detect and identifyfeatures in the images that are grabbed by the RGB-D camera.We have used vSLAM on the cloud to help agents navigatein their environment [4], [18]. This was done through theuse of RG-Chromaticity to process and find features in theenvironment. RG-Chromaticity was used to inspect each pixelfor RGB intensity and match it to images stored in thedatabase. This was implemented and tested on a Pioneer2UGV. The system was guided around the UTSA BSE building2nd floor and returned to the initial starting position. Usingwhat we have already done we have looked into other methodsof feature detection so the system is able to understand itsenvironment efficiently. This technique has some drawbacksdue to lighting not being consistent.

Fig. 1, shows the general architecture of a cloud VSLAMnode using different packages in ROS, such as OpenCV, orthe Point Cloud Library (PCL) [12]. ROS is a very useful toolin the creation of this algorithm, as it is designed to handle allthe sensor data for the system [16]. The work flow for mergingthe local maps and creating a global map can be explained as:

• Obtain local maps by using ROS package.• Apply PC smoothing on maps to remove the noisy data.• Define a global map M.• Apply transformation matrix on local maps.• Use the map forming equation (6) to get M.• Apply point cloud registration for the two local maps

including transformations to obtain the merged map.• Apply point cloud registration technique between the

maps to merge the two local maps into a global map.

Fig. 1: General architecture of a cloud VSLAM node.

A. Hardware

The RGB-D camera selected for this research is an ASUSXtion Pro Live sensor. An ODROID-XU4 ARM-based octa-core mini computer running Ubuntu 14.04 XU4 RoboticsEdition is used to interface with the RGB-D camera [19].

Fig. 2: Cloud infrastructure of the System of Systems

Fig. 3: System integration and architecture for the Robots’point of view illustrated with ROS messages.

A Kobuki TurtleBot2 UGV provides the mobile base for theexperiments. offload images to a cloud node, which can beseen in Fig. 2. Our system will consist of a UGV, KobukiTurtleBot2, equipped with an RGB-D sensor, and a ROS readyODROID-XU4. ROS will allow us to control the TurtleBot2and process all RGB-D data as well as sensor data, Fig. 3 .

This setup will allow the UGV to use a vSLAM algorithmfor navigation. Next, we will find features for the vSLAM al-gorithm, and pass any features detected to a program designedfor localization. Thus, the system can know where it is in itswork environment. Our algorithm will decide if an old featureis the same as a newly detected feature. This is importantfor the system, since it will have to generate a map of theenvironment for future use. This process will be repeated asthe system navigates towards the specified goal to create amore complete map.

IV. EXPERIMENTAL RESULTS

This section presents experimental results from a VSLAMalgorithm that uses a monocular RGB camera for vision input.

A. Cooperative 3D Mapping

Images captured by the monocular cameras of the twoquadcopters, and placing them into a global map. Later on,these features are used to localize the robots as illustrated inFig. 4. Then, localization technique uses feature matchingbetween the point cloud (PC) data of the current view and theglobal map as shown in Fig. 5.

Fig. 4: Overview of Cooperative Mapping and Localizationalgorithm

Fig. 5: The Feature matching technique used for localization

The results in Fig. 6 shows the mapped areas by thetwo cameras. In this case, one of the quadrotors initial posecoincides with the origin of Global Frame while the initial poseof the other quadrotor is (0,-1,0,0,0,180). Map1 and Map2 arefiltered and transformed based on their initial poses and sincethe first cameras initial pose coincides with the origin, thereis no transformation observed.

Fig. 7 shows the results of the global map when they arecombined and then merged.

Fig. 6: Map1(red) and Map2(white) before (above) and afterfiltering in Case 1.

Fig. 7: Map1 and Map2 are merged in Case 1.

For instance, in Case 2, the initial pose of the first quad-copter is (0,0,0,0,45,0) and the initial pose of the secondquadcopter is (0,-0.5,0,0,0,90). In Fig. 8, the two maps showsthe process of mapping followed by the transformation. Later,these two maps are combined and merged.

B. Self Localization

Fig. 9 shows mapping results of the robot before andafter filtering out the bad correspondences. The positioningexplains the location of the robot, and the 3D pose estimatesare the outputs of the Kalman Filter, when the sensor readingsincorporates all the correspondences.

After neglecting the bad correspondences, the localization

Fig. 8: Global Map finalized after transformation for Case 2.

Fig. 9: Feature Matching for localization of the quadcoptor be-fore (above) and after filtering. Filtering applied with removalof bad matches.

of the robot has resulted in better approximation. As wediscussed previously considering the yaw angle, the goodestimates for correspondences were the outputs of the KalmanFiltering, when the sensor readings incorporates only the goodcorrespondences. Our results can be seen in Fig. 10.

Fig. 10: Kalman Filter estimation for the x-coordinates ofthe quadcopter (above), and the yaw angle estimation afterfiltering out the bad correspondences.

V. CONCLUSION

The work done till now builds the map after gathering thedata by performing the data processing operations of smooth-ing and registration i.e. the map is built offline. Also, thelocalization module estimates the location of the quadcopterin the global map built offline. Hence, one of the immediatefuture work involves building a framework to perform theoperation of cooperative mapping and localization simultane-ously. The next future work will be to test the algorithm ofcooperative mapping with a RGB-D sensor like the Kinectwhich might signify the importance of the algorithm in abetter way with faster map building operations and much betteraccuracy. Also, the localization operation will be much fasterand more accurate.

Cooperative SLAM operations involves dealing with highamount of data due to the huge number of point cloudsinvolved in building the Global Map using the local mapsobtained from the quadcopters. As the mapping area increases,using the cooperative operation becomes more sensible so asthe map the area faster and also with better accuracy withdata fusion.Also, better sensors leads to better accuracy. Thisimplies that a cooperative SLAM operation with better sensorswill lead to tremendous amount of point cloud data whichwill practically involve huge amounts of computation and highprocessing power. This calls in for the use of Cloud Computingwhich takes care of the high computation and processingpower requirements.

REFERENCES

[1] M. Du, J. Wang, J. Li, H. Cao, G. Cui, J. Fang, J. Lv, and X. Chen,“Robot robust object recognition based on fast surf feature matching,” inChinese Automation Congress (CAC), 2013. IEEE, 2013, pp. 581–586.

[2] F. Endres, J. Hess, J. Sturm, D. Cremers, and W. Burgard, “3-d mappingwith an rgb-d camera,” Robotics, IEEE Transactions on, vol. 30, no. 1,pp. 177–187, 2014.

[3] M. Paton and J. Kosecka, “Adaptive rgb-d localization,” in Computerand Robot Vision (CRV), 2012 Ninth Conference on. IEEE, 2012, pp.24–31.

[4] P. Benavidez, M. Muppidi, P. Rad, J. Prevost, M. Jamshidi, andL. Brown, “Cloud-based realtime robotic visual slam,” in SystemsConference (SysCon), 2015 9th Annual IEEE International, April 2015,pp. 773–777.

[5] X. Li and N. Aouf, “Experimental research on cooperative vslamfor uavs,” in Computational Intelligence, Communication Systems andNetworks (CICSyN), 2013 Fifth International Conference on. IEEE,2013, pp. 385–390.

[6] F. Endres, J. Hess, N. Engelhard, J. Sturm, D. Cremers, and W. Burgard,“An evaluation of the rgb-d slam system,” in Robotics and Automation(ICRA), 2012 IEEE International Conference on. IEEE, 2012, pp.1691–1696.

[7] P. Newman and K. Ho, “Slam-loop closing with visually salient fea-tures,” in Robotics and Automation, 2005. ICRA 2005. Proceedings ofthe 2005 IEEE International Conference on. IEEE, 2005, pp. 635–642.

[8] G. Bradski et al., “The opencv library,” Doctor Dobbs Journal, vol. 25,no. 11, pp. 120–126, 2000.

[9] D. Wang and D. Liu, “Sift-preserving compression of mobile-capturedlicense plate images for recognition,” in Wireless Communications andSignal Processing (WCSP), 2014 Sixth International Conference on.IEEE, 2014, pp. 1–5.

[10] Y. Qin, H. Xu, and H. Chen, “Image feature points matching viaimproved orb,” in Progress in Informatics and Computing (PIC), 2014International Conference on. IEEE, 2014, pp. 204–208.

[11] R. Arumugam, V. R. Enti, L. Bingbing, W. Xiaojun, K. Baskaran, F. F.Kong, K. D. Meng, G. W. Kit et al., “Davinci: A cloud computingframework for service robots,” in Robotics and Automation (ICRA), 2010IEEE International Conference on. IEEE, 2010, pp. 3084–3089.

[12] R. B. Rusu and S. Cousins, “3D is here: Point Cloud Library (PCL),”in IEEE International Conference on Robotics and Automation (ICRA),Shanghai, China, May 9-13 2011.

[13] B. Bellekens, V. Spruyt, R. B. M. Weyn, and R. Berkvens, “A surveyof rigid 3d pointcloud registration algorithms,” in The Fourth Interna-tional Conference on Ambient Computing, Applications, Services andTechnologies. Citeseer, 2014.

[14] G. Klein and D. Murray, “Parallel tracking and mapping for small arworkspaces,” in Mixed and Augmented Reality, 2007. ISMAR 2007. 6thIEEE and ACM International Symposium on. IEEE, 2007, pp. 225–234.

[15] J. Engel, J. Sturm, and D. Cremers, “Camera-based navigation of alow-cost quadrocopter,” in Intelligent Robots and Systems (IROS), 2012IEEE/RSJ International Conference on. IEEE, 2012, pp. 2815–2821.

[16] M. Quigley, K. Conley, B. Gerkey, J. Faust, T. Foote, J. Leibs,R. Wheeler, and A. Y. Ng, “Ros: an open-source robot operating system,”in ICRA workshop on open source software, vol. 3, no. 3.2, 2009, p. 5.

[17] G. G. Slabaugh, “Computing euler anglesfrom a rotation matrix.” [Online]. Available:http://www.staff.city.ac.uk/ sbbh653/publications/euler.pdf

[18] P. Benavidez, M. Muppidi, and M. Jamshidi, “Improving visual slamalgorithms for use in realtime robotic applications,” in World AutomationCongress (WAC), 2014, Aug 2014, pp. 1–6.

[19] “ODROID Forum View topic - Ubuntu 14.04 Robotics Editionfor ODROID-XU3 (ROS+OpenCV+PCL).” [Online]. Available:http://forum.odroid.com/viewtopic.php?f=95&t=7446

Date post:	17-Aug-2020
Category:	Documents
Upload:	others
View:	4 times
Download:	0 times

Cloud-based Control and vSLAM through Cooperative Mapping...

Documents