Self-Supervised Audio-Visual Feature Learning for Single-Modal Incremental Terrain Type Clustering

The key to an accurate understanding of terrain is to extract the informative features from the multi-modal data obtained from different devices. Sensors, such as RGB cameras, depth sensors, vibration sensors, and microphones, are used as the multi-modal data. Many studies have explored ways to use...

Full description

Bibliographic Details
Main Authors: Reina Ishikawa, Ryo Hachiuma, Hideo Saito
Format: Article
Language:English
Published: IEEE 2021-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/9416486/