An End-to-End Trainable Multi-Column CNN for Scene Recognition in Extremely Changing Environment

Scene recognition is an essential part in the vision-based robot navigation domain. The successful application of deep learning technology has triggered more extensive preliminary studies on scene recognition, which all use extracted features from networks that are trained for recognition tasks. In...

Full description

Bibliographic Details
Main Authors: Zhenyu Li, Aiguo Zhou, Yong Shen
Format: Article
Language:English
Published: MDPI AG 2020-03-01
Series:Sensors
Subjects:
Online Access:https://www.mdpi.com/1424-8220/20/6/1556
id doaj-ded492dc205f4c5da7c95968514dc7f8
record_format Article
spelling doaj-ded492dc205f4c5da7c95968514dc7f82020-11-25T02:38:13ZengMDPI AGSensors1424-82202020-03-01206155610.3390/s20061556s20061556An End-to-End Trainable Multi-Column CNN for Scene Recognition in Extremely Changing EnvironmentZhenyu Li0Aiguo Zhou1Yong Shen2School of Mechanical Engineering, Tongji University, Shanghai 201804, ChinaSchool of Mechanical Engineering, Tongji University, Shanghai 201804, ChinaSchool of Automotive Studies, Tongji University, Shanghai 201804, ChinaScene recognition is an essential part in the vision-based robot navigation domain. The successful application of deep learning technology has triggered more extensive preliminary studies on scene recognition, which all use extracted features from networks that are trained for recognition tasks. In the paper, we interpret scene recognition as a region-based image retrieval problem and present a novel approach for scene recognition with an end-to-end trainable Multi-column convolutional neural network (MCNN) architecture. The proposed MCNN utilizes filters with receptive fields of different sizes to have Multi-level and Multi-layer image perception, and consists of three components: front-end, middle-end and back-end. The first seven layers VGG16 are taken as front-end for two-dimensional feature extraction, Inception-A is taken as the middle-end for deeper learning feature representation, and Large-Margin Softmax Loss (L-Softmax) is taken as the back-end for enhancing intra-class compactness and inter-class-separability. Extensive experiments have been conducted to evaluate the performance according to compare our proposed network to existing state-of-the-art methods. Experimental results on three popular datasets demonstrate the robustness and accuracy of our approach. To the best of our knowledge, the presented approach has not been applied for the scene recognition in literature.https://www.mdpi.com/1424-8220/20/6/1556scene recognitionmulti-column cnnimage retrievalend-to-end trainable network
collection DOAJ
language English
format Article
sources DOAJ
author Zhenyu Li
Aiguo Zhou
Yong Shen
spellingShingle Zhenyu Li
Aiguo Zhou
Yong Shen
An End-to-End Trainable Multi-Column CNN for Scene Recognition in Extremely Changing Environment
Sensors
scene recognition
multi-column cnn
image retrieval
end-to-end trainable network
author_facet Zhenyu Li
Aiguo Zhou
Yong Shen
author_sort Zhenyu Li
title An End-to-End Trainable Multi-Column CNN for Scene Recognition in Extremely Changing Environment
title_short An End-to-End Trainable Multi-Column CNN for Scene Recognition in Extremely Changing Environment
title_full An End-to-End Trainable Multi-Column CNN for Scene Recognition in Extremely Changing Environment
title_fullStr An End-to-End Trainable Multi-Column CNN for Scene Recognition in Extremely Changing Environment
title_full_unstemmed An End-to-End Trainable Multi-Column CNN for Scene Recognition in Extremely Changing Environment
title_sort end-to-end trainable multi-column cnn for scene recognition in extremely changing environment
publisher MDPI AG
series Sensors
issn 1424-8220
publishDate 2020-03-01
description Scene recognition is an essential part in the vision-based robot navigation domain. The successful application of deep learning technology has triggered more extensive preliminary studies on scene recognition, which all use extracted features from networks that are trained for recognition tasks. In the paper, we interpret scene recognition as a region-based image retrieval problem and present a novel approach for scene recognition with an end-to-end trainable Multi-column convolutional neural network (MCNN) architecture. The proposed MCNN utilizes filters with receptive fields of different sizes to have Multi-level and Multi-layer image perception, and consists of three components: front-end, middle-end and back-end. The first seven layers VGG16 are taken as front-end for two-dimensional feature extraction, Inception-A is taken as the middle-end for deeper learning feature representation, and Large-Margin Softmax Loss (L-Softmax) is taken as the back-end for enhancing intra-class compactness and inter-class-separability. Extensive experiments have been conducted to evaluate the performance according to compare our proposed network to existing state-of-the-art methods. Experimental results on three popular datasets demonstrate the robustness and accuracy of our approach. To the best of our knowledge, the presented approach has not been applied for the scene recognition in literature.
topic scene recognition
multi-column cnn
image retrieval
end-to-end trainable network
url https://www.mdpi.com/1424-8220/20/6/1556
work_keys_str_mv AT zhenyuli anendtoendtrainablemulticolumncnnforscenerecognitioninextremelychangingenvironment
AT aiguozhou anendtoendtrainablemulticolumncnnforscenerecognitioninextremelychangingenvironment
AT yongshen anendtoendtrainablemulticolumncnnforscenerecognitioninextremelychangingenvironment
AT zhenyuli endtoendtrainablemulticolumncnnforscenerecognitioninextremelychangingenvironment
AT aiguozhou endtoendtrainablemulticolumncnnforscenerecognitioninextremelychangingenvironment
AT yongshen endtoendtrainablemulticolumncnnforscenerecognitioninextremelychangingenvironment
_version_ 1724792054950658048