Summary: | Simultaneous Localisation and Mapping (SLAM) began as a technique to enable real-time robotic navigation on previously unexplored environments. The created maps however were designed for the sole purpose of localising the robot (i.e. what is the position and orientation of the robot in relation to the map) and several systems demonstrated the increasing descriptive power of map representations, which on vision-only SLAM solutions consisted of simple sparse corner-like features as well as edges, planes and most recently fully dense surfaces that abandon the notion of sparse structures altogether. Early sparse representations enjoyed the benefit of being simple to maintain as features could be added, optimised and removed independently while being memory and compute efficient, making them suitable for robust real-time camera tracking that relies on a consistent map. However, sparse representations are limiting when it comes to interaction, as for example, a robot aiming to safely navigate in an environment would need to sense complete surfaces in addition to empty space. Furthermore, sparse features can only be detected on highly-textured areas and during slow motion. Recent dense methods overcome the limitations of sparse methods as they can work in situations where corner features would fail to be detected due to blurry images created during rapid camera motion and also enable to correctly reason about occlusions and complete 3D surfaces, thus raising the interaction capabilities to new levels. This is only possible thanks to the advent of commodity parallel processing power and large amount of memory on Graphic Processing Units (GPUs) that needs careful consideration during algorithm design. However, increasing the map density makes creating consistent structures more challenging due to the vast amount of parameters to optimise and the interdependencies amongst them. More importantly, our interest is in making interaction even more sophisticated by abandoning the idea that an environment is a dense monolithic structure in favour of one composed of discrete detachable objects and bounded regions having physical properties and metadata. This work explores the development of a new type of visual SLAM system representing the map with semantically meaningful objects and planar regions which we call Dense Semantic SLAM, enabling new types of interaction where applications that can go beyond asking the question of "where am I" towards "what is around me and what can I do with it". In a way it can be seen as a return to lightweight sparse-based representations while keeping the predictive power of dense methods with added scene understanding at the object and region levels.
|