TP-GAN: Simple Adversarial Network With Additional Player for Dense Depth Image Estimation

We present a simple yet robust monocular depth estimation technique by synthesizing a depth map image from a single RGB input image using the advantage of generative adversarial networks (GAN). We employ an additional sub-model termed refiner to extract local depth features, then combine it with the...

Full description

Bibliographic Details
Main Authors: Hendra, A. (Author), Kanazawa, Y. (Author)
Format: Article
Language:English
Published: Institute of Electrical and Electronics Engineers Inc. 2023
Subjects:
Online Access:View Fulltext in Publisher
View in Scopus
LEADER 03113nam a2200421Ia 4500
001 10.1109-ACCESS.2023.3272292
008 230529s2023 CNT 000 0 und d
020 |a 21693536 (ISSN) 
245 1 0 |a TP-GAN: Simple Adversarial Network With Additional Player for Dense Depth Image Estimation 
260 0 |b Institute of Electrical and Electronics Engineers Inc.  |c 2023 
300 |a 16 
856 |z View Fulltext in Publisher  |u https://doi.org/10.1109/ACCESS.2023.3272292 
856 |z View in Scopus  |u https://www.scopus.com/inward/record.uri?eid=2-s2.0-85159708152&doi=10.1109%2fACCESS.2023.3272292&partnerID=40&md5=3f20ebc066fe4c819e2db9681a8fa03d 
520 3 |a We present a simple yet robust monocular depth estimation technique by synthesizing a depth map image from a single RGB input image using the advantage of generative adversarial networks (GAN). We employ an additional sub-model termed refiner to extract local depth features, then combine it with the global scene information from the generator to improve the GAN's performance compared to the standard GAN architectural scheme. Notably, the generator is the first player to learn to synthesize depth images. The second player, the discriminator, classifies the generated depth. In the meantime, the third player, the refiner, enhances the final reconstructed depth. Complementing the GAN model, we apply a conditional generative network (cGAN) to lead the generator in mapping the input image to the respective depth representation. We further incorporate a structured similarity (SSIM) as our loss function for the generator and refiner in GAN training. Through extensive experiment validation, we confirmed the performance of our strategy on the publicly indoor NYU Depth v2 and KITTI outdoor data. Experiment results on the NYU depth v2 dataset show that our proposed approach achieves the best performance by 96.0% on threshold accuracy (δ < 1.252) and the second-best accuracy on all thresholds on the KITTI dataset. We discovered that our proposed method compares favorably to numerous existing monocular depth estimation strategies and demonstrates a considerable improvement in the accuracy of image depth estimation despite its simple network architecture. © 2013 IEEE. 
650 0 4 |a conditional GAN 
650 0 4 |a Conditional generative adversarial network 
650 0 4 |a Convolutional neural network 
650 0 4 |a Depth estimation 
650 0 4 |a Depth Estimation 
650 0 4 |a Generative adversarial network 
650 0 4 |a generative adversarial network (GAN) 
650 0 4 |a Generative adversarial networks 
650 0 4 |a Generator 
650 0 4 |a Image enhancement 
650 0 4 |a Image reconstruction 
650 0 4 |a Images reconstruction 
650 0 4 |a Job analysis 
650 0 4 |a Network architecture 
650 0 4 |a Neural networks 
650 0 4 |a Performance 
650 0 4 |a single image 
650 0 4 |a Single images 
650 0 4 |a Task analysis 
650 0 4 |a third player GAN 
650 0 4 |a Third player generative adversarial network 
700 1 0 |a Hendra, A.  |e author 
700 1 0 |a Kanazawa, Y.  |e author 
773 |t IEEE Access