Wasserstein Adversarial Domain Adaptation

碩士 === 國立交通大學 === 電機工程學系 === 107 === Deep learning has achieved a great success in many real-world applications ranging the fields from computer vision to natural language processing. Basically, a desirable performance based on deep learning requires a large amount of labelled data for supervised tr...

Full description

Bibliographic Details
Main Authors: Lyu, Yu-Ying, 呂昱穎
Other Authors: Chien, Jen-Tzung
Format: Others
Language:en_US
Published: 2018
Online Access:http://ndltd.ncl.edu.tw/handle/2zxass
Description
Summary:碩士 === 國立交通大學 === 電機工程學系 === 107 === Deep learning has achieved a great success in many real-world applications ranging the fields from computer vision to natural language processing. Basically, a desirable performance based on deep learning requires a large amount of labelled data for supervised training. Transfer learning, as an approach to reduce the cost of retraining a model, aims to utilize the knowledge of source domain to improve the learning performance for target domain. This approach includes different settings under different scenarios. We focus on the setting of domain adaptation where the target domain provides a few labeled data or unlabeled data. The goal of domain adaptation is to minimize the domain shift from the observed data in both domains. The extractor of the shared features is trained to fulfill this goal. Recently, the generative adversarial network (GAN) has been successfully developed to extract the shared features for domain adaptation where the classes in source and target domains are identical. This thesis presents a new Wasserstein adversarial domain adaptation where two crucial issues are tackled. The first issue is the difficulty in adversarial training due to model collapse and gradient vanishing. One important reason causes this issue is the optimization over the asymmetric metric based on the Kullback-Leibler divergence. Therefore, we introduce the symmetric metric based on Wasserstein distance in learning procedure where the geometric property for optimal transport is held to mitigate the training difficulty in adversarial domain adaptation. Secondly, we deal with a new challenging issue in domain adaptation where the classes in target domain are defined as a subset of the classes in source domain. This scenario is practical in many systems when the knowledge is transferred from a large-scale domain to a small-scale domain. Traditional domain adaptation methods could not handle how to reduce the redundancy in source domain with the transfer of relevant knowledge to target domain. We therefore present a partial transfer to handle this issue by means of finding the highly correlated part in source and target domains and excluding the lowly correlated part to prevent the negative transfer which will degrade the performance of domain adaptation. In addition to the partial transfer, the second issue is also handled by performing the class dependent feature matching. A domain is represented as a mixture distribution which integrates a number of individual components or distributions corresponding to each individual classes. The relevance between the components in source and target domains is evaluated to attend different components for domain adaptation. Again, the Wasserstein distance is adopted for geometrical evaluation for partial transfer. The proposed Wasserstein adversarial domain adaptation (WADA) is therefore constructed with four networks for feature extractor, component relevant, task classifier and domain discriminator. Feature extractor is implemented by an auto-encoder network with the Wasserstein constraint on the latent representation of the shared features. The relevant network decides the relevance or priority for transferring the components of source domain to those of target domain. The task classifier is trained by maximizing the accuracy of training data while the domain discriminator is estimated by minimizing the accuracy of classifying the source and target domains. In the experiments, we evaluate the proposed WADA model based on the MNIST, MNIST-M, USPS and Office-Caltech datasets, then show the benchmark in the comparison with the related works.