Predicting Depth from Single RGB Images with Pyramidal Three-Streamed Networks

Predicting depth from a monocular image is an ill-posed and inherently ambiguous issue in computer vision. In this paper, we propose a pyramidal third-streamed network (PTSN) that recovers the depth information using a single given RGB image. PTSN uses pyramidal structure images, which can extract m...

Full description

Bibliographic Details
Main Authors: Songnan Chen, Mengxia Tang, Jiangming Kan
Format: Article
Language:English
Published: MDPI AG 2019-02-01
Series:Sensors
Subjects:
Online Access:https://www.mdpi.com/1424-8220/19/3/667
Description
Summary:Predicting depth from a monocular image is an ill-posed and inherently ambiguous issue in computer vision. In this paper, we propose a pyramidal third-streamed network (PTSN) that recovers the depth information using a single given RGB image. PTSN uses pyramidal structure images, which can extract multiresolution features to improve the robustness of the network as the network input. The full connection layer is changed into fully convolutional layers with a new <i>upconvolution</i> structure, which reduces the network parameters and computational complexity. We propose a new loss function including scale-invariant, horizontal and vertical gradient loss that not only helps predict the depth values, but also clearly obtains local contours. We evaluate PTSN on the NYU Depth v2 dataset and the experimental results show that our depth predictions have better accuracy than competing methods.
ISSN:1424-8220