Cloud-based Control and vSLAM through Cooperative Mapping and Localization*
Berat A. Erol, Satish Vaishnav, Joaquin D. Labrado, Patrick Benavidez, and Mo Jamshidi Department of Electrical and Computer Engineering
Abstract—Simultaneous Localization and Mapping (SLAM) is one of the most widely popular and applied methods designed for more accurate localization and navigation operations. Our experiments showed that vision based mapping helps agents navigate in unknown environments using feature based mapping and localization. Instead of using a classical monocular camera as a vision source, we have decided to use RGB-D (Red, Green, Blue, Depth) cameras for better feature detection, 3D mapping, and localization. This is due to the fact that the RGB-D camera returns depth data as well as the normal RGB data. Moreover, we have applied this method on multiple robots using the concept of cooperative SLAM. This paper illustrates our current research findings and proposes a new architecture based on gathered data from RGB-D cameras, which are the Microsoft Kinect and the ASUS Xtion Pro for 3D mapping and localization.
Index Terms—Cooperative SLAM, vSLAM, Robotics, RGB-D, Cloud Computing
GPS-denied localization can be a computational bottleneck for autonomous robotic systems navigating an unknown envi- ronment, locating objects of interest, performing automation operations, etc. This problem requires a system to understand the work environment, be aware the obstacles, and map the environment. This is commonly referred as ”the chicken and egg problem” in the literature, since calculating the location requires a map, and to map the area the system should know its current location. Therefore, a system needs to initialize itself before and during the mapping process as well as localize itself simultaneously. Hence, the problem defined as Simultaneous Localization and Mapping (SLAM).
The ability to sense the environment is pivotal for any autonomous mobile system that intends to navigate through a known or unknown area. In the past, this was done through the aid of sensors which is returned data that allows the mobile systems to know where they were in the world . Nowadays, there is a easier method to accomplish this task with devices such as Xbox Kinect. a camera provides RGB-D information to the mobile system –. RGB-D sensors are now being used to help mobile systems navigate by creating a map of their work environment , –.
* This work was supported by Grant number FA8750-15-2-0116 from Air Force Research Laboratory and OSD through a contract with North Carolina Agricultural and Technical State University.
Cooperative SLAM is one operational approach that aims to build a common map of entire work environment, and based on a shared data to calculate the locations of the system’s agents at the same time. If SLAM operations by one system involves high computational complexity and high power consumption, then the level of the computational complexity and power con- sumption with multiple systems will be undeniably increased. Therefore, this calls for a solution that will act as a master platform which builds a general map that can be accessed by any system currently in the operation. Additionally, it will process all the sensor data provided by the multiple systems, or any system added for that operation later on.
The purpose of this paper is to propose a vision based SLAM system (vSLAM) includes RGB-D cameras with cloud computing back-end. That is ideal for multiple unmanned ground vehicles (UGV) and aerial vehicles (UAV), and co- operative work flow among them. The algorithm is discussed here in order to better identify key features in the world that could be used in each agents’ own map, or one that is shared across multiple agents.
The complexity of the problem depends on the system and the operation to be carried. Computation of vSLAM algorithms on a robot is typically not done as it comes with a relatively high cost in computational complexity which in turn affects the power consumption and the positional data update rate. Storage is also an issue as large amounts of data can be acquired for mapping tasks in vSLAM. Both the storage and computational limits can prohibit extended use of systems relying on vSLAM.
The paper is organized as follows; Section II is the litera- ture survey, which is followed by Section III with proposed architecture. Section IV covers results of the experiments that were discussed, followed by the conclusions, Section V.
II. LITERATURE SURVEY
In the literature one can see several implementations on cooperative SLAM that use stereo vision camera systems acquiring the data for fusing the cooperative data . In- formation filters have suitable and convenient characteristics, especially for the multi-sensor data fusing application task.
In , a vision based SLAM approach is presented using the feedback from RGB-D camera. The features from the
color images have been extracted and projected onto the point cloud which was obtained from the depth images. 3D correspondences were used to find correspondences between consecutive images which helps in finding the transformation between the two images. A matrix is used to obtain the map data and hence localization process is done using this map data. In addition to these approaches and research findings,  presents a method of loop closing using visual features in conjunction with the laser scan data. The algorithm uses the pose and time information when the image at any given moment matches with the previous images and the scanned patch by the laser at those two moments are compared. Using this information, a covariance is obtained which provides appropriate information for loop closing.
For our experiments the implemented algorithm was a Grid Based Fast SLAM by using: a single node, two nodes, four nodes, and eight nodes of a cluster. The execution time was decreased by several orders of magnitude with increasing num- ber of nodes. On the other hand, there were some limitations due to this approach, such as using a separate hardware on server implementations, manual implementation of nodes, as well as neglecting network delays/latencies.
A. Feature Detection and Fusion into Map Data
The most common method for feature detection in SLAM systems has been SIFT , , ORB , and SURF . These algorithms are widely used for visual odometry, for the building of 3D maps, and to detect objects or points of interest. All three of these techniques are useful in their own right for object recognition. The only difficultly with these algorithms is that each of them require a large amount of images to be stored. This could be handled by the cloud computing by off- loading images and running a SIFT or SURF algorithm on the cloud. This is a method that will be tested once the test bed for the system is created.
For an encouraging approach, DAVinCi is a Cloud Com- puting Framework for Service Robots. It acts as Platform as a Service (PaaS), which is designed to perform secondary tasks like mapping, localization and others . Also it works as an active interface between all the heterogeneous agents, the back end computation, and storage through the use of ROS and Hadoop Distributed File System (HDFS). Its role is to act as a master node which runs the ROS name services and maintains the list of publishers and subscribers, as well as receive messages either from the HDFS back end or from other agents/robots. On the other hand, it is important to state that DaVinCi does not provide a real-time implementation.
Another method to find features in the mobile systems environment is done through the use of Point Clouds and the Iterative Closest Point (ICP) algorithm , . Point clouds allow for systems to use depth information from the RGB-D data stream that the sensor returns. The ICP method is meant to find the smallest distance between two points in a point cloud. This can be done by generating a point cloud and then comparing the individual points.
Covariance Intersection (CI) is a method of data fusion which combines two or more states estimates. In addition to this, CI gathers sensor measurements from different platforms having an unknown correlation among them. Improvements in the location determination and mapping process is achieved by using collaborative estimation which is implemented with an information filter and a CI technique.  shows less navigation errors by using CI, coping with the memory and computation cost that arises from the increasing size of the augmented state vector matrix, and its corresponding uncertainty, which aims to improve the robustness of Cooperative visual SLAM (vSLAM).
B. Proposed Method for Map Merging
For the aim of our experiments, the map points should be transformed from the world coordinate frame to the camera- centered coordinate frame. In other words, we project map points into the image plane. This is done by left-multiplication with a ECW matrix, which represents camera pose , ECW stands for a 4x4 transformation matrix from Camera frame C, to World frame W. pjC is the pose of camera in camera- centered coordinate frame, and pjW is the pose of camera in world coordinate frame. Then,
pjC = ECW ∗ pjW (1)
For visual observation, after tracking a video frame suc- cessfully, by using parallel tracking and mapping (PTAM), the pose estimate is scaled by the current estimate for a factor of λ∗ and it is transformed
Click here to load reader