To navigate in an unknown environment, a mobile robot needs to build a map of the environment and localize itself in the map at the same time. The process addressing this dual problem is called Simultaneous Localization wand Mapping (SLAM). In an outdoor environment, this can generally be solved by a GPS which provides a good accuracy for the tasks the robot can take on. However, when moving indoor or in places where the GPS data is not available, or not reliable enough, it can become difficult to estimate the robot’s position precisely and other solutions have to be found.
The main problem raised with SLAM comes from the uncertainty of the measurements, due to the sensory noise or technical limitations. Probabilistic models are widely used to reduce the inherent errors and provide satisfying estimations. While this process is generally based on data provided by sensors such as laser scanners, combined with the odometry, Visual Simultaneous Localization and Mapping (VSLAM) focuses on the use of camera.
However, recently there has been a great interest in processing data acquired using depth measuring sensors due to the availability of cheap and efficient RGB-D cameras. For instance, the Kinect Camera developed by Prime Sense and Microsoft has considerably changed the situation, providing a 3D camera at a very affordable price. Primarily designed for entertainment, it has received a warm welcome in the research community, especially in robotics.
An approach in SLAM algorithms refined in 2014 is the RGB-D SLAM. The RGB-D SLAM consists of a location mapping algorithm faced to RGB-D cameras. The VSLAM process can be described as estimating the poses of the camera from its data stream (video and depth), in order to reconstruct the entire environment while the camera is moving. As the sensory noise leads to deviations in the estimations of each camera poses with respect to the real motion, the goal is to build a 3D map which is close, as much as possible, to the real environment. When using RGBD based visual SLAM algorithms that takes advantage of a partial knowledge of the scene (partial 3D model of the environment). This information is integrated in the bundle adjustment process to provide accurate localization and mapping.
In this work, we address the problem of localization and mapping by developing a visual SLAM solution for systems with RGBD sensors. The core contribution is the strategy to track visual features, which has the ability of tracking and adding key points on an as-needed basis. Though a VSLAM system is intended to be used in real time.
The rest of the document is structured as follows:
Chapter 2 presents the background and the concepts that are used in this area, the features and methods used in visual SLAM systems and how the transformation is computed.
Chapter 3 presents the feature matching between a couple of frames, and how a 3D transformation can be computed from these associations using the depth data.
Chapter 4 describes the implementation of visual SLAM system, by estimating the camera poses through the use of a pose graph, refining them through the detection of loop closures, and finally performing a 3D reconstruction.
Chapter 5 presents the experimental results by showing how the data was acquired, software implementation, and the data outputs with examples of maps generated from different datasets.
Chapter 6 presents the conclusions