Filling the Gaps in Atrous Convolution: Semantic Segmentation With a Better Context

The main challenge for scene parsing arises when complex scenes with highly diverse objects are encountered. The objects not only differ in scale and appearance but also in semantics. Previous works focus on encoding the multi-scale contextual information (via pooling or atrous convolutions) general...

Full description

Bibliographic Details
Main Authors: Liyuan Liu, Yanwei Pang, Syed Waqas Zamir, Salman Khan, Fahad Shahbaz Khan, Ling Shao
Format: Article
Language:English
Published: IEEE 2020-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/8861330/
id doaj-300a6aac1bfd42ee9cc2ea01d28e5fea
record_format Article
spelling doaj-300a6aac1bfd42ee9cc2ea01d28e5fea2021-03-30T02:02:32ZengIEEEIEEE Access2169-35362020-01-018340193402810.1109/ACCESS.2019.29460318861330Filling the Gaps in Atrous Convolution: Semantic Segmentation With a Better ContextLiyuan Liu0Yanwei Pang1https://orcid.org/0000-0001-6670-3727Syed Waqas Zamir2Salman Khan3Fahad Shahbaz Khan4Ling Shao5Tianjin Key Laboratory of Brain-Inspired Intelligence Technology, School of Electrical and Information Engineering, Tianjin University, Tianjin, ChinaTianjin Key Laboratory of Brain-Inspired Intelligence Technology, School of Electrical and Information Engineering, Tianjin University, Tianjin, ChinaInception Institute of Artificial Intelligence, Abu Dhabi, United Arab EmiratesInception Institute of Artificial Intelligence, Abu Dhabi, United Arab EmiratesInception Institute of Artificial Intelligence, Abu Dhabi, United Arab EmiratesInception Institute of Artificial Intelligence, Abu Dhabi, United Arab EmiratesThe main challenge for scene parsing arises when complex scenes with highly diverse objects are encountered. The objects not only differ in scale and appearance but also in semantics. Previous works focus on encoding the multi-scale contextual information (via pooling or atrous convolutions) generally on top of compact high-level features (i.e., at a single stage). In this work, we argue that a rich set of cues exist at multiple stages of the network, encapsulating low, mid and high-level scene details. Therefore, an optimal scene parsing model must aggregate multi-scale context at all three levels of the feature hierarchy; a capability that lacks in state-of-the-art scene parsing models. To address this limitation, we introduce a novel architecture with three new blocks that systematically aggregate low, mid and high tier features. The heart of our approach is a high-level feature aggregation module that augments sparsely connected atrous convolution with dense local and layer-wise connections to avoid gridding artifacts. Besides, we employ a novel feature pyramid augmentation and semantic refinement unit to generate low- and mid-level features that are mixed with high-level features at the decoder. We extensively evaluate our proposed approach on the large-scale Cityscapes and ADE2K benchmarks. Our approach surpasses many latest models on both datasets, achieving mean intersection-over-union (mIoU) scores of 80.5% and 44.0% on Cityscapes and ADE20K, respectively.https://ieeexplore.ieee.org/document/8861330/Image processingneural networkssemantic segmentationsupervised learning
collection DOAJ
language English
format Article
sources DOAJ
author Liyuan Liu
Yanwei Pang
Syed Waqas Zamir
Salman Khan
Fahad Shahbaz Khan
Ling Shao
spellingShingle Liyuan Liu
Yanwei Pang
Syed Waqas Zamir
Salman Khan
Fahad Shahbaz Khan
Ling Shao
Filling the Gaps in Atrous Convolution: Semantic Segmentation With a Better Context
IEEE Access
Image processing
neural networks
semantic segmentation
supervised learning
author_facet Liyuan Liu
Yanwei Pang
Syed Waqas Zamir
Salman Khan
Fahad Shahbaz Khan
Ling Shao
author_sort Liyuan Liu
title Filling the Gaps in Atrous Convolution: Semantic Segmentation With a Better Context
title_short Filling the Gaps in Atrous Convolution: Semantic Segmentation With a Better Context
title_full Filling the Gaps in Atrous Convolution: Semantic Segmentation With a Better Context
title_fullStr Filling the Gaps in Atrous Convolution: Semantic Segmentation With a Better Context
title_full_unstemmed Filling the Gaps in Atrous Convolution: Semantic Segmentation With a Better Context
title_sort filling the gaps in atrous convolution: semantic segmentation with a better context
publisher IEEE
series IEEE Access
issn 2169-3536
publishDate 2020-01-01
description The main challenge for scene parsing arises when complex scenes with highly diverse objects are encountered. The objects not only differ in scale and appearance but also in semantics. Previous works focus on encoding the multi-scale contextual information (via pooling or atrous convolutions) generally on top of compact high-level features (i.e., at a single stage). In this work, we argue that a rich set of cues exist at multiple stages of the network, encapsulating low, mid and high-level scene details. Therefore, an optimal scene parsing model must aggregate multi-scale context at all three levels of the feature hierarchy; a capability that lacks in state-of-the-art scene parsing models. To address this limitation, we introduce a novel architecture with three new blocks that systematically aggregate low, mid and high tier features. The heart of our approach is a high-level feature aggregation module that augments sparsely connected atrous convolution with dense local and layer-wise connections to avoid gridding artifacts. Besides, we employ a novel feature pyramid augmentation and semantic refinement unit to generate low- and mid-level features that are mixed with high-level features at the decoder. We extensively evaluate our proposed approach on the large-scale Cityscapes and ADE2K benchmarks. Our approach surpasses many latest models on both datasets, achieving mean intersection-over-union (mIoU) scores of 80.5% and 44.0% on Cityscapes and ADE20K, respectively.
topic Image processing
neural networks
semantic segmentation
supervised learning
url https://ieeexplore.ieee.org/document/8861330/
work_keys_str_mv AT liyuanliu fillingthegapsinatrousconvolutionsemanticsegmentationwithabettercontext
AT yanweipang fillingthegapsinatrousconvolutionsemanticsegmentationwithabettercontext
AT syedwaqaszamir fillingthegapsinatrousconvolutionsemanticsegmentationwithabettercontext
AT salmankhan fillingthegapsinatrousconvolutionsemanticsegmentationwithabettercontext
AT fahadshahbazkhan fillingthegapsinatrousconvolutionsemanticsegmentationwithabettercontext
AT lingshao fillingthegapsinatrousconvolutionsemanticsegmentationwithabettercontext
_version_ 1724185820978479104