An End-to-End Trainable Multi-Column CNN for Scene Recognition in Extremely Changing Environment
Scene recognition is an essential part in the vision-based robot navigation domain. The successful application of deep learning technology has triggered more extensive preliminary studies on scene recognition, which all use extracted features from networks that are trained for recognition tasks. In...
Main Authors: | , , |
---|---|
Format: | Article |
Language: | English |
Published: |
MDPI AG
2020-03-01
|
Series: | Sensors |
Subjects: | |
Online Access: | https://www.mdpi.com/1424-8220/20/6/1556 |
id |
doaj-ded492dc205f4c5da7c95968514dc7f8 |
---|---|
record_format |
Article |
spelling |
doaj-ded492dc205f4c5da7c95968514dc7f82020-11-25T02:38:13ZengMDPI AGSensors1424-82202020-03-01206155610.3390/s20061556s20061556An End-to-End Trainable Multi-Column CNN for Scene Recognition in Extremely Changing EnvironmentZhenyu Li0Aiguo Zhou1Yong Shen2School of Mechanical Engineering, Tongji University, Shanghai 201804, ChinaSchool of Mechanical Engineering, Tongji University, Shanghai 201804, ChinaSchool of Automotive Studies, Tongji University, Shanghai 201804, ChinaScene recognition is an essential part in the vision-based robot navigation domain. The successful application of deep learning technology has triggered more extensive preliminary studies on scene recognition, which all use extracted features from networks that are trained for recognition tasks. In the paper, we interpret scene recognition as a region-based image retrieval problem and present a novel approach for scene recognition with an end-to-end trainable Multi-column convolutional neural network (MCNN) architecture. The proposed MCNN utilizes filters with receptive fields of different sizes to have Multi-level and Multi-layer image perception, and consists of three components: front-end, middle-end and back-end. The first seven layers VGG16 are taken as front-end for two-dimensional feature extraction, Inception-A is taken as the middle-end for deeper learning feature representation, and Large-Margin Softmax Loss (L-Softmax) is taken as the back-end for enhancing intra-class compactness and inter-class-separability. Extensive experiments have been conducted to evaluate the performance according to compare our proposed network to existing state-of-the-art methods. Experimental results on three popular datasets demonstrate the robustness and accuracy of our approach. To the best of our knowledge, the presented approach has not been applied for the scene recognition in literature.https://www.mdpi.com/1424-8220/20/6/1556scene recognitionmulti-column cnnimage retrievalend-to-end trainable network |
collection |
DOAJ |
language |
English |
format |
Article |
sources |
DOAJ |
author |
Zhenyu Li Aiguo Zhou Yong Shen |
spellingShingle |
Zhenyu Li Aiguo Zhou Yong Shen An End-to-End Trainable Multi-Column CNN for Scene Recognition in Extremely Changing Environment Sensors scene recognition multi-column cnn image retrieval end-to-end trainable network |
author_facet |
Zhenyu Li Aiguo Zhou Yong Shen |
author_sort |
Zhenyu Li |
title |
An End-to-End Trainable Multi-Column CNN for Scene Recognition in Extremely Changing Environment |
title_short |
An End-to-End Trainable Multi-Column CNN for Scene Recognition in Extremely Changing Environment |
title_full |
An End-to-End Trainable Multi-Column CNN for Scene Recognition in Extremely Changing Environment |
title_fullStr |
An End-to-End Trainable Multi-Column CNN for Scene Recognition in Extremely Changing Environment |
title_full_unstemmed |
An End-to-End Trainable Multi-Column CNN for Scene Recognition in Extremely Changing Environment |
title_sort |
end-to-end trainable multi-column cnn for scene recognition in extremely changing environment |
publisher |
MDPI AG |
series |
Sensors |
issn |
1424-8220 |
publishDate |
2020-03-01 |
description |
Scene recognition is an essential part in the vision-based robot navigation domain. The successful application of deep learning technology has triggered more extensive preliminary studies on scene recognition, which all use extracted features from networks that are trained for recognition tasks. In the paper, we interpret scene recognition as a region-based image retrieval problem and present a novel approach for scene recognition with an end-to-end trainable Multi-column convolutional neural network (MCNN) architecture. The proposed MCNN utilizes filters with receptive fields of different sizes to have Multi-level and Multi-layer image perception, and consists of three components: front-end, middle-end and back-end. The first seven layers VGG16 are taken as front-end for two-dimensional feature extraction, Inception-A is taken as the middle-end for deeper learning feature representation, and Large-Margin Softmax Loss (L-Softmax) is taken as the back-end for enhancing intra-class compactness and inter-class-separability. Extensive experiments have been conducted to evaluate the performance according to compare our proposed network to existing state-of-the-art methods. Experimental results on three popular datasets demonstrate the robustness and accuracy of our approach. To the best of our knowledge, the presented approach has not been applied for the scene recognition in literature. |
topic |
scene recognition multi-column cnn image retrieval end-to-end trainable network |
url |
https://www.mdpi.com/1424-8220/20/6/1556 |
work_keys_str_mv |
AT zhenyuli anendtoendtrainablemulticolumncnnforscenerecognitioninextremelychangingenvironment AT aiguozhou anendtoendtrainablemulticolumncnnforscenerecognitioninextremelychangingenvironment AT yongshen anendtoendtrainablemulticolumncnnforscenerecognitioninextremelychangingenvironment AT zhenyuli endtoendtrainablemulticolumncnnforscenerecognitioninextremelychangingenvironment AT aiguozhou endtoendtrainablemulticolumncnnforscenerecognitioninextremelychangingenvironment AT yongshen endtoendtrainablemulticolumncnnforscenerecognitioninextremelychangingenvironment |
_version_ |
1724792054950658048 |