Learning to Fuse Multiscale Features for Visual Place Recognition

Efficient and robust visual place recognition is of great importance to autonomous mobile robots. Recent work has shown that features learned from convolutional neural networks achieve impressed performance with efficient feature size, where most of them are pooled or aggregated from a convolutional...

Full description

Bibliographic Details
Main Authors: Jun Mao, Xiaoping Hu, Xiaofeng He, Lilian Zhang, Liao Wu, Michael J. Milford
Format: Article
Language:English
Published: IEEE 2019-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/8585013/
id doaj-b6f38de914b84263b22c9b11742cf4b1
record_format Article
spelling doaj-b6f38de914b84263b22c9b11742cf4b12021-03-29T22:07:06ZengIEEEIEEE Access2169-35362019-01-0175723573510.1109/ACCESS.2018.28890308585013Learning to Fuse Multiscale Features for Visual Place RecognitionJun Mao0https://orcid.org/0000-0002-2477-0742Xiaoping Hu1Xiaofeng He2Lilian Zhang3Liao Wu4Michael J. Milford5Department of Automation, National University of Defense Technology, Changsha, ChinaDepartment of Automation, National University of Defense Technology, Changsha, ChinaDepartment of Automation, National University of Defense Technology, Changsha, ChinaDepartment of Automation, National University of Defense Technology, Changsha, ChinaSchool of Electrical Engineering and Computer Science, Queensland University of Technology, Brisbane, QLD, AustraliaSchool of Electrical Engineering and Computer Science, Queensland University of Technology, Brisbane, QLD, AustraliaEfficient and robust visual place recognition is of great importance to autonomous mobile robots. Recent work has shown that features learned from convolutional neural networks achieve impressed performance with efficient feature size, where most of them are pooled or aggregated from a convolutional feature map. However, convolutional filters only capture the appearance of their perceptive fields, which lack the considerations on how to combine the multiscale appearance for place recognition. In this paper, we propose a novel method to build a multiscale feature pyramid and present two approaches to use the pyramid to augment the place recognition capability. The first approach fuses the pyramid to obtain a new feature map, which has an awareness of both the local and semi-global appearance, and the second approach learns an attention model from the feature pyramid to weight the spatial grids on the original feature map. Both approaches combine the multiscale features in the pyramid to suppress the confusing local features while tackling the problem in two different ways. Extensive experiments have been conducted on benchmark datasets with varying degrees of appearance and viewpoint variations. The results show that the proposed approaches achieve superior performance over the networks without the multiscale feature fusion and the multiscale attention components. Analyses on the performance of using different feature pyramids are also provided.https://ieeexplore.ieee.org/document/8585013/Visual place recognitiondeep learningmobile robotslocalization
collection DOAJ
language English
format Article
sources DOAJ
author Jun Mao
Xiaoping Hu
Xiaofeng He
Lilian Zhang
Liao Wu
Michael J. Milford
spellingShingle Jun Mao
Xiaoping Hu
Xiaofeng He
Lilian Zhang
Liao Wu
Michael J. Milford
Learning to Fuse Multiscale Features for Visual Place Recognition
IEEE Access
Visual place recognition
deep learning
mobile robots
localization
author_facet Jun Mao
Xiaoping Hu
Xiaofeng He
Lilian Zhang
Liao Wu
Michael J. Milford
author_sort Jun Mao
title Learning to Fuse Multiscale Features for Visual Place Recognition
title_short Learning to Fuse Multiscale Features for Visual Place Recognition
title_full Learning to Fuse Multiscale Features for Visual Place Recognition
title_fullStr Learning to Fuse Multiscale Features for Visual Place Recognition
title_full_unstemmed Learning to Fuse Multiscale Features for Visual Place Recognition
title_sort learning to fuse multiscale features for visual place recognition
publisher IEEE
series IEEE Access
issn 2169-3536
publishDate 2019-01-01
description Efficient and robust visual place recognition is of great importance to autonomous mobile robots. Recent work has shown that features learned from convolutional neural networks achieve impressed performance with efficient feature size, where most of them are pooled or aggregated from a convolutional feature map. However, convolutional filters only capture the appearance of their perceptive fields, which lack the considerations on how to combine the multiscale appearance for place recognition. In this paper, we propose a novel method to build a multiscale feature pyramid and present two approaches to use the pyramid to augment the place recognition capability. The first approach fuses the pyramid to obtain a new feature map, which has an awareness of both the local and semi-global appearance, and the second approach learns an attention model from the feature pyramid to weight the spatial grids on the original feature map. Both approaches combine the multiscale features in the pyramid to suppress the confusing local features while tackling the problem in two different ways. Extensive experiments have been conducted on benchmark datasets with varying degrees of appearance and viewpoint variations. The results show that the proposed approaches achieve superior performance over the networks without the multiscale feature fusion and the multiscale attention components. Analyses on the performance of using different feature pyramids are also provided.
topic Visual place recognition
deep learning
mobile robots
localization
url https://ieeexplore.ieee.org/document/8585013/
work_keys_str_mv AT junmao learningtofusemultiscalefeaturesforvisualplacerecognition
AT xiaopinghu learningtofusemultiscalefeaturesforvisualplacerecognition
AT xiaofenghe learningtofusemultiscalefeaturesforvisualplacerecognition
AT lilianzhang learningtofusemultiscalefeaturesforvisualplacerecognition
AT liaowu learningtofusemultiscalefeaturesforvisualplacerecognition
AT michaeljmilford learningtofusemultiscalefeaturesforvisualplacerecognition
_version_ 1724192203347066880