A Deep Joint Network for Monocular Depth Estimation Based on Pseudo-Depth Supervision

Depth estimation from a single image is a significant task. Although deep learning methods hold great promise in this area, they still face a number of challenges, including the limited modeling of nonlocal dependencies, lack of effective loss function joint optimization models, and difficulty in ac...

Full description

Bibliographic Details
Published in:Mathematics
Main Authors: Jiahai Tan, Ming Gao, Tao Duan, Xiaomei Gao
Format: Article
Language:English
Published: MDPI AG 2023-11-01
Subjects:
Online Access:https://www.mdpi.com/2227-7390/11/22/4645
_version_ 1850135344196878336
author Jiahai Tan
Ming Gao
Tao Duan
Xiaomei Gao
author_facet Jiahai Tan
Ming Gao
Tao Duan
Xiaomei Gao
author_sort Jiahai Tan
collection DOAJ
container_title Mathematics
description Depth estimation from a single image is a significant task. Although deep learning methods hold great promise in this area, they still face a number of challenges, including the limited modeling of nonlocal dependencies, lack of effective loss function joint optimization models, and difficulty in accurately estimating object edges. In order to further increase the network’s prediction accuracy, a new structure and training method are proposed for single-image depth estimation in this research. A pseudo-depth network is first deployed for generating a single-image depth prior, and by constructing connecting paths between multi-scale local features using the proposed up-mapping and jumping modules, the network can integrate representations and recover fine details. A deep network is also designed to capture and convey global context by utilizing the Transformer Conv module and Unet Depth net to extract and refine global features. The two networks jointly provide meaningful coarse and fine features to predict high-quality depth images from single RGB images. In addition, multiple joint losses are utilized to enhance the training model. A series of experiments are carried out to confirm and demonstrate the efficacy of our method. The proposed method exceeds the advanced method DPT by 10% and 3.3% in terms of root mean square error (RMSE(log)) and 1.7% and 1.6% in terms of squared relative difference (SRD), respectively, according to experimental results on the NYU Depth V2 and KITTI depth estimation benchmarks.
format Article
id doaj-art-4fde71e10faf413faffc90cd4c1f3537
institution Directory of Open Access Journals
issn 2227-7390
language English
publishDate 2023-11-01
publisher MDPI AG
record_format Article
spelling doaj-art-4fde71e10faf413faffc90cd4c1f35372025-08-19T23:51:19ZengMDPI AGMathematics2227-73902023-11-011122464510.3390/math11224645A Deep Joint Network for Monocular Depth Estimation Based on Pseudo-Depth SupervisionJiahai Tan0Ming Gao1Tao Duan2Xiaomei Gao3School of Optoelectronic Engineering, Xi’an Technological University, Xi’an 710021, ChinaSchool of Optoelectronic Engineering, Xi’an Technological University, Xi’an 710021, ChinaState Key Laboratory of Transient Optics and Photonics, Xi’an Institute of Optics and Precision Mechanics, Chinese Academy of Sciences, Xi’an 710119, ChinaXi’an Mapping and Printing of China National Administration of Coal Geology, Xi’an 710199, ChinaDepth estimation from a single image is a significant task. Although deep learning methods hold great promise in this area, they still face a number of challenges, including the limited modeling of nonlocal dependencies, lack of effective loss function joint optimization models, and difficulty in accurately estimating object edges. In order to further increase the network’s prediction accuracy, a new structure and training method are proposed for single-image depth estimation in this research. A pseudo-depth network is first deployed for generating a single-image depth prior, and by constructing connecting paths between multi-scale local features using the proposed up-mapping and jumping modules, the network can integrate representations and recover fine details. A deep network is also designed to capture and convey global context by utilizing the Transformer Conv module and Unet Depth net to extract and refine global features. The two networks jointly provide meaningful coarse and fine features to predict high-quality depth images from single RGB images. In addition, multiple joint losses are utilized to enhance the training model. A series of experiments are carried out to confirm and demonstrate the efficacy of our method. The proposed method exceeds the advanced method DPT by 10% and 3.3% in terms of root mean square error (RMSE(log)) and 1.7% and 1.6% in terms of squared relative difference (SRD), respectively, according to experimental results on the NYU Depth V2 and KITTI depth estimation benchmarks.https://www.mdpi.com/2227-7390/11/22/4645monocular depth estimationpseudo-depth nettransformerencoder–decoder
spellingShingle Jiahai Tan
Ming Gao
Tao Duan
Xiaomei Gao
A Deep Joint Network for Monocular Depth Estimation Based on Pseudo-Depth Supervision
monocular depth estimation
pseudo-depth net
transformer
encoder–decoder
title A Deep Joint Network for Monocular Depth Estimation Based on Pseudo-Depth Supervision
title_full A Deep Joint Network for Monocular Depth Estimation Based on Pseudo-Depth Supervision
title_fullStr A Deep Joint Network for Monocular Depth Estimation Based on Pseudo-Depth Supervision
title_full_unstemmed A Deep Joint Network for Monocular Depth Estimation Based on Pseudo-Depth Supervision
title_short A Deep Joint Network for Monocular Depth Estimation Based on Pseudo-Depth Supervision
title_sort deep joint network for monocular depth estimation based on pseudo depth supervision
topic monocular depth estimation
pseudo-depth net
transformer
encoder–decoder
url https://www.mdpi.com/2227-7390/11/22/4645
work_keys_str_mv AT jiahaitan adeepjointnetworkformonoculardepthestimationbasedonpseudodepthsupervision
AT minggao adeepjointnetworkformonoculardepthestimationbasedonpseudodepthsupervision
AT taoduan adeepjointnetworkformonoculardepthestimationbasedonpseudodepthsupervision
AT xiaomeigao adeepjointnetworkformonoculardepthestimationbasedonpseudodepthsupervision
AT jiahaitan deepjointnetworkformonoculardepthestimationbasedonpseudodepthsupervision
AT minggao deepjointnetworkformonoculardepthestimationbasedonpseudodepthsupervision
AT taoduan deepjointnetworkformonoculardepthestimationbasedonpseudodepthsupervision
AT xiaomeigao deepjointnetworkformonoculardepthestimationbasedonpseudodepthsupervision