A Deep Joint Network for Monocular Depth Estimation Based on Pseudo-Depth Supervision
Depth estimation from a single image is a significant task. Although deep learning methods hold great promise in this area, they still face a number of challenges, including the limited modeling of nonlocal dependencies, lack of effective loss function joint optimization models, and difficulty in ac...
| Published in: | Mathematics |
|---|---|
| Main Authors: | , , , |
| Format: | Article |
| Language: | English |
| Published: |
MDPI AG
2023-11-01
|
| Subjects: | |
| Online Access: | https://www.mdpi.com/2227-7390/11/22/4645 |
| _version_ | 1850135344196878336 |
|---|---|
| author | Jiahai Tan Ming Gao Tao Duan Xiaomei Gao |
| author_facet | Jiahai Tan Ming Gao Tao Duan Xiaomei Gao |
| author_sort | Jiahai Tan |
| collection | DOAJ |
| container_title | Mathematics |
| description | Depth estimation from a single image is a significant task. Although deep learning methods hold great promise in this area, they still face a number of challenges, including the limited modeling of nonlocal dependencies, lack of effective loss function joint optimization models, and difficulty in accurately estimating object edges. In order to further increase the network’s prediction accuracy, a new structure and training method are proposed for single-image depth estimation in this research. A pseudo-depth network is first deployed for generating a single-image depth prior, and by constructing connecting paths between multi-scale local features using the proposed up-mapping and jumping modules, the network can integrate representations and recover fine details. A deep network is also designed to capture and convey global context by utilizing the Transformer Conv module and Unet Depth net to extract and refine global features. The two networks jointly provide meaningful coarse and fine features to predict high-quality depth images from single RGB images. In addition, multiple joint losses are utilized to enhance the training model. A series of experiments are carried out to confirm and demonstrate the efficacy of our method. The proposed method exceeds the advanced method DPT by 10% and 3.3% in terms of root mean square error (RMSE(log)) and 1.7% and 1.6% in terms of squared relative difference (SRD), respectively, according to experimental results on the NYU Depth V2 and KITTI depth estimation benchmarks. |
| format | Article |
| id | doaj-art-4fde71e10faf413faffc90cd4c1f3537 |
| institution | Directory of Open Access Journals |
| issn | 2227-7390 |
| language | English |
| publishDate | 2023-11-01 |
| publisher | MDPI AG |
| record_format | Article |
| spelling | doaj-art-4fde71e10faf413faffc90cd4c1f35372025-08-19T23:51:19ZengMDPI AGMathematics2227-73902023-11-011122464510.3390/math11224645A Deep Joint Network for Monocular Depth Estimation Based on Pseudo-Depth SupervisionJiahai Tan0Ming Gao1Tao Duan2Xiaomei Gao3School of Optoelectronic Engineering, Xi’an Technological University, Xi’an 710021, ChinaSchool of Optoelectronic Engineering, Xi’an Technological University, Xi’an 710021, ChinaState Key Laboratory of Transient Optics and Photonics, Xi’an Institute of Optics and Precision Mechanics, Chinese Academy of Sciences, Xi’an 710119, ChinaXi’an Mapping and Printing of China National Administration of Coal Geology, Xi’an 710199, ChinaDepth estimation from a single image is a significant task. Although deep learning methods hold great promise in this area, they still face a number of challenges, including the limited modeling of nonlocal dependencies, lack of effective loss function joint optimization models, and difficulty in accurately estimating object edges. In order to further increase the network’s prediction accuracy, a new structure and training method are proposed for single-image depth estimation in this research. A pseudo-depth network is first deployed for generating a single-image depth prior, and by constructing connecting paths between multi-scale local features using the proposed up-mapping and jumping modules, the network can integrate representations and recover fine details. A deep network is also designed to capture and convey global context by utilizing the Transformer Conv module and Unet Depth net to extract and refine global features. The two networks jointly provide meaningful coarse and fine features to predict high-quality depth images from single RGB images. In addition, multiple joint losses are utilized to enhance the training model. A series of experiments are carried out to confirm and demonstrate the efficacy of our method. The proposed method exceeds the advanced method DPT by 10% and 3.3% in terms of root mean square error (RMSE(log)) and 1.7% and 1.6% in terms of squared relative difference (SRD), respectively, according to experimental results on the NYU Depth V2 and KITTI depth estimation benchmarks.https://www.mdpi.com/2227-7390/11/22/4645monocular depth estimationpseudo-depth nettransformerencoder–decoder |
| spellingShingle | Jiahai Tan Ming Gao Tao Duan Xiaomei Gao A Deep Joint Network for Monocular Depth Estimation Based on Pseudo-Depth Supervision monocular depth estimation pseudo-depth net transformer encoder–decoder |
| title | A Deep Joint Network for Monocular Depth Estimation Based on Pseudo-Depth Supervision |
| title_full | A Deep Joint Network for Monocular Depth Estimation Based on Pseudo-Depth Supervision |
| title_fullStr | A Deep Joint Network for Monocular Depth Estimation Based on Pseudo-Depth Supervision |
| title_full_unstemmed | A Deep Joint Network for Monocular Depth Estimation Based on Pseudo-Depth Supervision |
| title_short | A Deep Joint Network for Monocular Depth Estimation Based on Pseudo-Depth Supervision |
| title_sort | deep joint network for monocular depth estimation based on pseudo depth supervision |
| topic | monocular depth estimation pseudo-depth net transformer encoder–decoder |
| url | https://www.mdpi.com/2227-7390/11/22/4645 |
| work_keys_str_mv | AT jiahaitan adeepjointnetworkformonoculardepthestimationbasedonpseudodepthsupervision AT minggao adeepjointnetworkformonoculardepthestimationbasedonpseudodepthsupervision AT taoduan adeepjointnetworkformonoculardepthestimationbasedonpseudodepthsupervision AT xiaomeigao adeepjointnetworkformonoculardepthestimationbasedonpseudodepthsupervision AT jiahaitan deepjointnetworkformonoculardepthestimationbasedonpseudodepthsupervision AT minggao deepjointnetworkformonoculardepthestimationbasedonpseudodepthsupervision AT taoduan deepjointnetworkformonoculardepthestimationbasedonpseudodepthsupervision AT xiaomeigao deepjointnetworkformonoculardepthestimationbasedonpseudodepthsupervision |
