Semantic Segmentation of Urban Buildings from VHR Remote Sensing Imagery Using a Deep Convolutional Neural Network

Urban building segmentation is a prevalent research domain for very high resolution (VHR) remote sensing; however, various appearances and complicated background of VHR remote sensing imagery make accurate semantic segmentation of urban buildings a challenge in relevant applications. Following the b...

Full description

Bibliographic Details
Main Authors: Yaning Yi, Zhijie Zhang, Wanchang Zhang, Chuanrong Zhang, Weidong Li, Tian Zhao
Format: Article
Language:English
Published: MDPI AG 2019-07-01
Series:Remote Sensing
Subjects:
Online Access:https://www.mdpi.com/2072-4292/11/15/1774
id doaj-4d756467fa0d44bb8c35fdf6ecbca338
record_format Article
spelling doaj-4d756467fa0d44bb8c35fdf6ecbca3382020-11-24T21:30:42ZengMDPI AGRemote Sensing2072-42922019-07-011115177410.3390/rs11151774rs11151774Semantic Segmentation of Urban Buildings from VHR Remote Sensing Imagery Using a Deep Convolutional Neural NetworkYaning Yi0Zhijie Zhang1Wanchang Zhang2Chuanrong Zhang3Weidong Li4Tian Zhao5Key Laboratory of Digital Earth Science, Institute of Remote Sensing and Digital Earth, Chinese Academy of Sciences, Beijing 100094, ChinaDepartment of Geography, University of Connecticut, Storrs, CT 06269, USAKey Laboratory of Digital Earth Science, Institute of Remote Sensing and Digital Earth, Chinese Academy of Sciences, Beijing 100094, ChinaDepartment of Geography, University of Connecticut, Storrs, CT 06269, USADepartment of Geography, University of Connecticut, Storrs, CT 06269, USADepartment of Computer Science, University of Wisconsin, Milwaukee, WI 53211, USAUrban building segmentation is a prevalent research domain for very high resolution (VHR) remote sensing; however, various appearances and complicated background of VHR remote sensing imagery make accurate semantic segmentation of urban buildings a challenge in relevant applications. Following the basic architecture of U-Net, an end-to-end deep convolutional neural network (denoted as DeepResUnet) was proposed, which can effectively perform urban building segmentation at pixel scale from VHR imagery and generate accurate segmentation results. The method contains two sub-networks: One is a cascade down-sampling network for extracting feature maps of buildings from the VHR image, and the other is an up-sampling network for reconstructing those extracted feature maps back to the same size of the input VHR image. The deep residual learning approach was adopted to facilitate training in order to alleviate the degradation problem that often occurred in the model training process. The proposed DeepResUnet was tested with aerial images with a spatial resolution of 0.075 m and was compared in performance under the exact same conditions with six other state-of-the-art networks—FCN-8s, SegNet, DeconvNet, U-Net, ResUNet and DeepUNet. Results of extensive experiments indicated that the proposed DeepResUnet outperformed the other six existing networks in semantic segmentation of urban buildings in terms of visual and quantitative evaluation, especially in labeling irregular-shape and small-size buildings with higher accuracy and entirety. Compared with the U-Net, the F1 score, Kappa coefficient and overall accuracy of DeepResUnet were improved by 3.52%, 4.67% and 1.72%, respectively. Moreover, the proposed DeepResUnet required much fewer parameters than the U-Net, highlighting its significant improvement among U-Net applications. Nevertheless, the inference time of DeepResUnet is slightly longer than that of the U-Net, which is subject to further improvement.https://www.mdpi.com/2072-4292/11/15/1774semantic segmentationurban building extractiondeep convolutional neural networkVHR remote sensing imageryU-Net
collection DOAJ
language English
format Article
sources DOAJ
author Yaning Yi
Zhijie Zhang
Wanchang Zhang
Chuanrong Zhang
Weidong Li
Tian Zhao
spellingShingle Yaning Yi
Zhijie Zhang
Wanchang Zhang
Chuanrong Zhang
Weidong Li
Tian Zhao
Semantic Segmentation of Urban Buildings from VHR Remote Sensing Imagery Using a Deep Convolutional Neural Network
Remote Sensing
semantic segmentation
urban building extraction
deep convolutional neural network
VHR remote sensing imagery
U-Net
author_facet Yaning Yi
Zhijie Zhang
Wanchang Zhang
Chuanrong Zhang
Weidong Li
Tian Zhao
author_sort Yaning Yi
title Semantic Segmentation of Urban Buildings from VHR Remote Sensing Imagery Using a Deep Convolutional Neural Network
title_short Semantic Segmentation of Urban Buildings from VHR Remote Sensing Imagery Using a Deep Convolutional Neural Network
title_full Semantic Segmentation of Urban Buildings from VHR Remote Sensing Imagery Using a Deep Convolutional Neural Network
title_fullStr Semantic Segmentation of Urban Buildings from VHR Remote Sensing Imagery Using a Deep Convolutional Neural Network
title_full_unstemmed Semantic Segmentation of Urban Buildings from VHR Remote Sensing Imagery Using a Deep Convolutional Neural Network
title_sort semantic segmentation of urban buildings from vhr remote sensing imagery using a deep convolutional neural network
publisher MDPI AG
series Remote Sensing
issn 2072-4292
publishDate 2019-07-01
description Urban building segmentation is a prevalent research domain for very high resolution (VHR) remote sensing; however, various appearances and complicated background of VHR remote sensing imagery make accurate semantic segmentation of urban buildings a challenge in relevant applications. Following the basic architecture of U-Net, an end-to-end deep convolutional neural network (denoted as DeepResUnet) was proposed, which can effectively perform urban building segmentation at pixel scale from VHR imagery and generate accurate segmentation results. The method contains two sub-networks: One is a cascade down-sampling network for extracting feature maps of buildings from the VHR image, and the other is an up-sampling network for reconstructing those extracted feature maps back to the same size of the input VHR image. The deep residual learning approach was adopted to facilitate training in order to alleviate the degradation problem that often occurred in the model training process. The proposed DeepResUnet was tested with aerial images with a spatial resolution of 0.075 m and was compared in performance under the exact same conditions with six other state-of-the-art networks—FCN-8s, SegNet, DeconvNet, U-Net, ResUNet and DeepUNet. Results of extensive experiments indicated that the proposed DeepResUnet outperformed the other six existing networks in semantic segmentation of urban buildings in terms of visual and quantitative evaluation, especially in labeling irregular-shape and small-size buildings with higher accuracy and entirety. Compared with the U-Net, the F1 score, Kappa coefficient and overall accuracy of DeepResUnet were improved by 3.52%, 4.67% and 1.72%, respectively. Moreover, the proposed DeepResUnet required much fewer parameters than the U-Net, highlighting its significant improvement among U-Net applications. Nevertheless, the inference time of DeepResUnet is slightly longer than that of the U-Net, which is subject to further improvement.
topic semantic segmentation
urban building extraction
deep convolutional neural network
VHR remote sensing imagery
U-Net
url https://www.mdpi.com/2072-4292/11/15/1774
work_keys_str_mv AT yaningyi semanticsegmentationofurbanbuildingsfromvhrremotesensingimageryusingadeepconvolutionalneuralnetwork
AT zhijiezhang semanticsegmentationofurbanbuildingsfromvhrremotesensingimageryusingadeepconvolutionalneuralnetwork
AT wanchangzhang semanticsegmentationofurbanbuildingsfromvhrremotesensingimageryusingadeepconvolutionalneuralnetwork
AT chuanrongzhang semanticsegmentationofurbanbuildingsfromvhrremotesensingimageryusingadeepconvolutionalneuralnetwork
AT weidongli semanticsegmentationofurbanbuildingsfromvhrremotesensingimageryusingadeepconvolutionalneuralnetwork
AT tianzhao semanticsegmentationofurbanbuildingsfromvhrremotesensingimageryusingadeepconvolutionalneuralnetwork
_version_ 1725962232215371776