SLAM-aware, self-supervised perception in mobile robots

Thesis: Ph. D., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, 2017. === This electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections. === Cataloged from student-su...

Full description

Bibliographic Details
Main Author: Pillai, Sudeep
Other Authors: John J. Leonard.
Format: Others
Language:English
Published: Massachusetts Institute of Technology 2018
Subjects:
Online Access:http://hdl.handle.net/1721.1/114054
Description
Summary:Thesis: Ph. D., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, 2017. === This electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections. === Cataloged from student-submitted PDF version of thesis. === Includes bibliographical references (pages 152-171). === Simultaneous Localization and Mapping (SLAM) is a fundamental capability in mobile robots, and has been typically considered in the context of aiding mapping and navigation tasks. In this thesis, we advocate for the use of SLAM as a supervisory signal to further the perceptual capabilities in robots. Through the concept of SLAM-supported object recognition, we develop the ability for robots equipped with a single camera to be able to leverage their SLAM-awareness (via Monocular Visual-SLAM) to better inform object recognition within its immediate environment. Additionally, by maintaining a spatially-cognizant view of the world,we find our SLAM-aware approach to be particularly amenable to few-shot object learning. We show that a SLAM-aware, few-shot object learning strategy can be especially advantageous to mobile robots, and is able to learn object detectors from a reduced set of training examples. Implicit to realizing modern visual-SLAM systems is its choice of map representation. It is imperative that the map representation is crucially utilized by multiple components in the robot's decision-making stack, while it is constantly optimized as more measurements are available. Motivated by the need for a unified map representation in vision-based mapping, navigation and planning, we develop an iterative and high-performance mesh-reconstruction algorithm for stereo imagery. We envision that in the future, these tunable mesh representations can potentially enable robots to quickly reconstruct their immediate surroundings while being able to directly plan in them and maneuver at high-speeds. While most visual-SLAM front-ends explicitly encode application-specific constraints for accurate and robust operation, we advocate for an automated solution to developing these systems. By bootstrapping the robot's ability to perform GP Saided SLAM, we develop a self-supervised visual-Slam front-end capable of performing visual ego-motion, and vision-based loop-closure recognition in mobile robots. We propose a novel, generative model solution that it is able to predict ego-motion estimates from optical flow, while also allowing for the prediction of induced scene flow conditioned on the ego-motion. Following a similar bootstrapped learning strategy, we explore the ability to self-supervise place recognition in mobile robots and cast it as a metric learning problem, with a GPS-aided SLAM solution providing the relevant supervision. Furthermore, we show that the newly learned embedding can be particularly powerful in discriminating visual scene instances from each other for the purpose of loop-closure detection. We envision that such self-supervised solutions to vision-based task learning will have far-reaching implications in several domains, especially facilitating life-long learning in autonomous systems. === by Sudeep Pillai. === Ph. D.