Visual Perception with Synthetic Data
In recent years, learning-based methods have become the dominant approach to solving computer vision tasks. A major reason for this development is their automatic adaptation to the particularities of the task at hand by learning a model of the problem from (training) data. This approach assumes that...
id |
ndltd-tu-darmstadt.de-oai-tuprints.ulb.tu-darmstadt.de-13245 |
---|---|
record_format |
oai_dc |
collection |
NDLTD |
language |
en |
format |
Others
|
sources |
NDLTD |
description |
In recent years, learning-based methods have become the dominant approach to solving computer vision tasks. A major reason for this development is their automatic adaptation to the particularities of the task at hand by learning a model of the problem from (training) data. This approach assumes that the training data closely resemble the data encountered during testing. Successfully applying a learning-based algorithm in a wide range of real-world scenarios thus requires collecting a large set of training data, which models the complex phenomena encountered in the real world and covers rare, but critical edge cases. For many tasks in computer vision, however, the human effort required severely limits the scale and diversity of datasets. A promising approach to reducing the human effort involved is data synthesis, by which considerable parts of the collection and annotation process can be automated. Employing synthetic data, however, poses unique challenges: first, synthesis is only as useful as methods are able to capitalize on virtually infinite amounts of data and arbitrary precision. Second, synthetic data must be sufficiently realistic for being useful in real-world scenarios. However, modeling real world phenomena within the synthesis can be even more laborious than collection and annotation of real datasets in the first place.
In this dissertation, we address these challenges in two ways: first, we propose to adapt data-driven methods to take advantage of the unique features of synthetic data. Specifically, we develop a method that reconstructs the surface of objects from a single view in uncalibrated illumination conditions. The method estimates illumination conditions and synthesizes suitable training data at test time, enabling reconstructions at unprecedented detail. Furthermore, we develop a memory-efficient approach for the reconstruction of complete 3D shapes from a single view. This way, we leverage the high precision available through 3D CAD models and obtain more accurate and detailed reconstructions than previous approaches.
Second, we propose to tap into computer games for creating ground truth for a variety of visual perception tasks. Open world computer games mimic the real world and feature a large diversity paired with high realism. Since source code is not available for commercial games, we devise a technique to intercept the rendering pipeline during game play and use the rendering resources for identifying objects in rendered images. As there is only limited semantic information available at the level of interception and manual association of resources with semantic classes is still necessary, we develop a method to speed up the annotation dramatically by recognizing shared resources and automatically propagating annotations across the dataset. Leveraging the geometric information available through the rendering process, we further collect ground truth for optical flow, visual odometry, and 3D scene layout. The synthesis of data from computer games reduces the human annotation effort significantly and allows creating synthetic datasets that model the real world at unprecedented scale. The ground truth for multiple visual perception tasks enables deeper analysis of current methods and the development of novel approaches that reason about multiple tasks holistically. For both the adaptation of data-driven methods as well as the datasets derived from computer games, we demonstrate significant performance improvements through quantitative and qualitative evaluations. |
author |
Richter, Stephan Randolf |
spellingShingle |
Richter, Stephan Randolf Visual Perception with Synthetic Data |
author_facet |
Richter, Stephan Randolf |
author_sort |
Richter, Stephan Randolf |
title |
Visual Perception with Synthetic Data |
title_short |
Visual Perception with Synthetic Data |
title_full |
Visual Perception with Synthetic Data |
title_fullStr |
Visual Perception with Synthetic Data |
title_full_unstemmed |
Visual Perception with Synthetic Data |
title_sort |
visual perception with synthetic data |
publishDate |
2020 |
url |
https://tuprints.ulb.tu-darmstadt.de/13245/13/Stephan_R_Richter_-_Visual_Perception_With_Synthetic_Data_-_2020.pdf Richter, Stephan Randolf <http://tuprints.ulb.tu-darmstadt.de/view/person/Richter=3AStephan_Randolf=3A=3A.html> (2020): Visual Perception with Synthetic Data.Darmstadt, Technische Universität, DOI: 10.25534/tuprints-00013245 <https://doi.org/10.25534/tuprints-00013245>, [Ph.D. Thesis] |
work_keys_str_mv |
AT richterstephanrandolf visualperceptionwithsyntheticdata |
_version_ |
1719337279969820672 |
spelling |
ndltd-tu-darmstadt.de-oai-tuprints.ulb.tu-darmstadt.de-132452020-08-11T05:11:27Z http://tuprints.ulb.tu-darmstadt.de/13245/ Visual Perception with Synthetic Data Richter, Stephan Randolf In recent years, learning-based methods have become the dominant approach to solving computer vision tasks. A major reason for this development is their automatic adaptation to the particularities of the task at hand by learning a model of the problem from (training) data. This approach assumes that the training data closely resemble the data encountered during testing. Successfully applying a learning-based algorithm in a wide range of real-world scenarios thus requires collecting a large set of training data, which models the complex phenomena encountered in the real world and covers rare, but critical edge cases. For many tasks in computer vision, however, the human effort required severely limits the scale and diversity of datasets. A promising approach to reducing the human effort involved is data synthesis, by which considerable parts of the collection and annotation process can be automated. Employing synthetic data, however, poses unique challenges: first, synthesis is only as useful as methods are able to capitalize on virtually infinite amounts of data and arbitrary precision. Second, synthetic data must be sufficiently realistic for being useful in real-world scenarios. However, modeling real world phenomena within the synthesis can be even more laborious than collection and annotation of real datasets in the first place. In this dissertation, we address these challenges in two ways: first, we propose to adapt data-driven methods to take advantage of the unique features of synthetic data. Specifically, we develop a method that reconstructs the surface of objects from a single view in uncalibrated illumination conditions. The method estimates illumination conditions and synthesizes suitable training data at test time, enabling reconstructions at unprecedented detail. Furthermore, we develop a memory-efficient approach for the reconstruction of complete 3D shapes from a single view. This way, we leverage the high precision available through 3D CAD models and obtain more accurate and detailed reconstructions than previous approaches. Second, we propose to tap into computer games for creating ground truth for a variety of visual perception tasks. Open world computer games mimic the real world and feature a large diversity paired with high realism. Since source code is not available for commercial games, we devise a technique to intercept the rendering pipeline during game play and use the rendering resources for identifying objects in rendered images. As there is only limited semantic information available at the level of interception and manual association of resources with semantic classes is still necessary, we develop a method to speed up the annotation dramatically by recognizing shared resources and automatically propagating annotations across the dataset. Leveraging the geometric information available through the rendering process, we further collect ground truth for optical flow, visual odometry, and 3D scene layout. The synthesis of data from computer games reduces the human annotation effort significantly and allows creating synthetic datasets that model the real world at unprecedented scale. The ground truth for multiple visual perception tasks enables deeper analysis of current methods and the development of novel approaches that reason about multiple tasks holistically. For both the adaptation of data-driven methods as well as the datasets derived from computer games, we demonstrate significant performance improvements through quantitative and qualitative evaluations. 2020 Ph.D. Thesis NonPeerReviewed text CC-BY-NC 4.0 International - Creative Commons, Attribution Non-commercial https://tuprints.ulb.tu-darmstadt.de/13245/13/Stephan_R_Richter_-_Visual_Perception_With_Synthetic_Data_-_2020.pdf Richter, Stephan Randolf <http://tuprints.ulb.tu-darmstadt.de/view/person/Richter=3AStephan_Randolf=3A=3A.html> (2020): Visual Perception with Synthetic Data.Darmstadt, Technische Universität, DOI: 10.25534/tuprints-00013245 <https://doi.org/10.25534/tuprints-00013245>, [Ph.D. Thesis] https://doi.org/10.25534/tuprints-00013245 en info:eu-repo/semantics/doctoralThesis info:eu-repo/semantics/openAccess |