Disentangled Representation Learning in Real-World Image Datasets via Image Segmentation Prior

We propose a novel method that can learn easy-to-interpret latent representations in real-world image datasets using a VAE-based model by splitting an image into several disjoint regions. Our method performs object-wise disentanglement by exploiting image segmentation and alpha compositing. With rem...

Full description

Bibliographic Details
Main Authors: Nao Nakagawa, Ren Togo, Takahiro Ogawa, Miki Haseyama
Format: Article
Language:English
Published: IEEE 2021-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/9502079/
Description
Summary:We propose a novel method that can learn easy-to-interpret latent representations in real-world image datasets using a VAE-based model by splitting an image into several disjoint regions. Our method performs object-wise disentanglement by exploiting image segmentation and alpha compositing. With remarkable results obtained by unsupervised disentanglement methods for toy datasets, recent studies have tackled challenging disentanglement for real-world image datasets. However, these methods involve deviations from the standard VAE architecture, which has favorable disentanglement properties. Thus, for disentanglement in images of real-world image datasets with preservation of the VAE backbone, we designed an encoder and a decoder that embed an image into disjoint sets of latent variables corresponding to objects. The encoder includes a pre-trained image segmentation network, which allows our model to focus only on representation learning while adopting image segmentation as an inductive bias. Evaluations using real-world image datasets, CelebA and Stanford Cars, showed that our method achieves improved disentanglement and transferability.
ISSN:2169-3536