Filling the Gaps in Atrous Convolution: Semantic Segmentation With a Better Context

The main challenge for scene parsing arises when complex scenes with highly diverse objects are encountered. The objects not only differ in scale and appearance but also in semantics. Previous works focus on encoding the multi-scale contextual information (via pooling or atrous convolutions) general...

Full description

Bibliographic Details
Main Authors:	Liyuan Liu, Yanwei Pang, Syed Waqas Zamir, Salman Khan, Fahad Shahbaz Khan, Ling Shao
Format:	Article
Language:	English
Published:	IEEE 2020-01-01
Series:	IEEE Access
Subjects:	Image processing neural networks semantic segmentation supervised learning
Online Access:	https://ieeexplore.ieee.org/document/8861330/

id	doaj-300a6aac1bfd42ee9cc2ea01d28e5fea
record_format	Article
spelling	doaj-300a6aac1bfd42ee9cc2ea01d28e5fea2021-03-30T02:02:32ZengIEEEIEEE Access2169-35362020-01-018340193402810.1109/ACCESS.2019.29460318861330Filling the Gaps in Atrous Convolution: Semantic Segmentation With a Better ContextLiyuan Liu0Yanwei Pang1https://orcid.org/0000-0001-6670-3727Syed Waqas Zamir2Salman Khan3Fahad Shahbaz Khan4Ling Shao5Tianjin Key Laboratory of Brain-Inspired Intelligence Technology, School of Electrical and Information Engineering, Tianjin University, Tianjin, ChinaTianjin Key Laboratory of Brain-Inspired Intelligence Technology, School of Electrical and Information Engineering, Tianjin University, Tianjin, ChinaInception Institute of Artificial Intelligence, Abu Dhabi, United Arab EmiratesInception Institute of Artificial Intelligence, Abu Dhabi, United Arab EmiratesInception Institute of Artificial Intelligence, Abu Dhabi, United Arab EmiratesInception Institute of Artificial Intelligence, Abu Dhabi, United Arab EmiratesThe main challenge for scene parsing arises when complex scenes with highly diverse objects are encountered. The objects not only differ in scale and appearance but also in semantics. Previous works focus on encoding the multi-scale contextual information (via pooling or atrous convolutions) generally on top of compact high-level features (i.e., at a single stage). In this work, we argue that a rich set of cues exist at multiple stages of the network, encapsulating low, mid and high-level scene details. Therefore, an optimal scene parsing model must aggregate multi-scale context at all three levels of the feature hierarchy; a capability that lacks in state-of-the-art scene parsing models. To address this limitation, we introduce a novel architecture with three new blocks that systematically aggregate low, mid and high tier features. The heart of our approach is a high-level feature aggregation module that augments sparsely connected atrous convolution with dense local and layer-wise connections to avoid gridding artifacts. Besides, we employ a novel feature pyramid augmentation and semantic refinement unit to generate low- and mid-level features that are mixed with high-level features at the decoder. We extensively evaluate our proposed approach on the large-scale Cityscapes and ADE2K benchmarks. Our approach surpasses many latest models on both datasets, achieving mean intersection-over-union (mIoU) scores of 80.5% and 44.0% on Cityscapes and ADE20K, respectively.https://ieeexplore.ieee.org/document/8861330/Image processingneural networkssemantic segmentationsupervised learning
collection	DOAJ
language	English
format	Article
sources	DOAJ
author	Liyuan Liu Yanwei Pang Syed Waqas Zamir Salman Khan Fahad Shahbaz Khan Ling Shao
spellingShingle	Liyuan Liu Yanwei Pang Syed Waqas Zamir Salman Khan Fahad Shahbaz Khan Ling Shao Filling the Gaps in Atrous Convolution: Semantic Segmentation With a Better Context IEEE Access Image processing neural networks semantic segmentation supervised learning
author_facet	Liyuan Liu Yanwei Pang Syed Waqas Zamir Salman Khan Fahad Shahbaz Khan Ling Shao
author_sort	Liyuan Liu
title	Filling the Gaps in Atrous Convolution: Semantic Segmentation With a Better Context
title_short	Filling the Gaps in Atrous Convolution: Semantic Segmentation With a Better Context
title_full	Filling the Gaps in Atrous Convolution: Semantic Segmentation With a Better Context
title_fullStr	Filling the Gaps in Atrous Convolution: Semantic Segmentation With a Better Context
title_full_unstemmed	Filling the Gaps in Atrous Convolution: Semantic Segmentation With a Better Context
title_sort	filling the gaps in atrous convolution: semantic segmentation with a better context
publisher	IEEE
series	IEEE Access
issn	2169-3536
publishDate	2020-01-01
description	The main challenge for scene parsing arises when complex scenes with highly diverse objects are encountered. The objects not only differ in scale and appearance but also in semantics. Previous works focus on encoding the multi-scale contextual information (via pooling or atrous convolutions) generally on top of compact high-level features (i.e., at a single stage). In this work, we argue that a rich set of cues exist at multiple stages of the network, encapsulating low, mid and high-level scene details. Therefore, an optimal scene parsing model must aggregate multi-scale context at all three levels of the feature hierarchy; a capability that lacks in state-of-the-art scene parsing models. To address this limitation, we introduce a novel architecture with three new blocks that systematically aggregate low, mid and high tier features. The heart of our approach is a high-level feature aggregation module that augments sparsely connected atrous convolution with dense local and layer-wise connections to avoid gridding artifacts. Besides, we employ a novel feature pyramid augmentation and semantic refinement unit to generate low- and mid-level features that are mixed with high-level features at the decoder. We extensively evaluate our proposed approach on the large-scale Cityscapes and ADE2K benchmarks. Our approach surpasses many latest models on both datasets, achieving mean intersection-over-union (mIoU) scores of 80.5% and 44.0% on Cityscapes and ADE20K, respectively.
topic	Image processing neural networks semantic segmentation supervised learning
url	https://ieeexplore.ieee.org/document/8861330/
work_keys_str_mv	AT liyuanliu fillingthegapsinatrousconvolutionsemanticsegmentationwithabettercontext AT yanweipang fillingthegapsinatrousconvolutionsemanticsegmentationwithabettercontext AT syedwaqaszamir fillingthegapsinatrousconvolutionsemanticsegmentationwithabettercontext AT salmankhan fillingthegapsinatrousconvolutionsemanticsegmentationwithabettercontext AT fahadshahbazkhan fillingthegapsinatrousconvolutionsemanticsegmentationwithabettercontext AT lingshao fillingthegapsinatrousconvolutionsemanticsegmentationwithabettercontext
_version_	1724185820978479104

Filling the Gaps in Atrous Convolution: Semantic Segmentation With a Better Context

Similar Items