Query is GAN: Scene Retrieval With Attentional Text-to-Image Generative Adversarial Network

Scene retrieval from input descriptions has been one of the most important applications with the increasing number of videos on the Web. However, this is still a challenging task since semantic gaps between features of texts and videos exist. In this paper, we try to solve this problem by utilizing...

Full description

Bibliographic Details
Main Authors:	Rintaro Yanagi, Ren Togo, Takahiro Ogawa, Miki Haseyama
Format:	Article
Language:	English
Published:	IEEE 2019-01-01
Series:	IEEE Access
Subjects:	Scene retrieval deep learning generative adversarial network text-to-image translation
Online Access:	https://ieeexplore.ieee.org/document/8868179/

id	doaj-88aff6ad59304aaea8247fbed16f20c2
record_format	Article
spelling	doaj-88aff6ad59304aaea8247fbed16f20c22021-03-29T23:04:46ZengIEEEIEEE Access2169-35362019-01-01715318315319310.1109/ACCESS.2019.29474098868179Query is GAN: Scene Retrieval With Attentional Text-to-Image Generative Adversarial NetworkRintaro Yanagi0https://orcid.org/0000-0003-0110-7208Ren Togo1https://orcid.org/0000-0002-4474-3995Takahiro Ogawa2https://orcid.org/0000-0001-5332-8112Miki Haseyama3Graduate School of Information Science and Technology, Hokkaido University, Sapporo, JapanFaculty of Information Science and Technology Division of Media and Network Technologies, Hokkaido University, Sapporo, JapanFaculty of Information Science and Technology Division of Media and Network Technologies, Hokkaido University, Sapporo, JapanFaculty of Information Science and Technology Division of Media and Network Technologies, Hokkaido University, Sapporo, JapanScene retrieval from input descriptions has been one of the most important applications with the increasing number of videos on the Web. However, this is still a challenging task since semantic gaps between features of texts and videos exist. In this paper, we try to solve this problem by utilizing a text-to-image Generative Adversarial Network (GAN), which has become one of the most attractive research topics in recent years. The text-to-image GAN is a deep learning model that can generate images from their corresponding descriptions. We propose a new retrieval framework, “Query is GAN”, based on the text-to-image GAN that drastically improves scene retrieval performance by simple procedures. Our novel idea makes use of images generated by the text-to-image GAN as queries for the scene retrieval task. In addition, unlike many studies on text-to-image GANs that mainly focused on the generation of high-quality images, we reveal that the generated images have reasonable visual features suitable for the queries even though they are not visually pleasant. We show the effectiveness of the proposed framework through experimental evaluation in which scene retrieval is performed from real video datasets.https://ieeexplore.ieee.org/document/8868179/Scene retrievaldeep learninggenerative adversarial networktext-to-image translation
collection	DOAJ
language	English
format	Article
sources	DOAJ
author	Rintaro Yanagi Ren Togo Takahiro Ogawa Miki Haseyama
spellingShingle	Rintaro Yanagi Ren Togo Takahiro Ogawa Miki Haseyama Query is GAN: Scene Retrieval With Attentional Text-to-Image Generative Adversarial Network IEEE Access Scene retrieval deep learning generative adversarial network text-to-image translation
author_facet	Rintaro Yanagi Ren Togo Takahiro Ogawa Miki Haseyama
author_sort	Rintaro Yanagi
title	Query is GAN: Scene Retrieval With Attentional Text-to-Image Generative Adversarial Network
title_short	Query is GAN: Scene Retrieval With Attentional Text-to-Image Generative Adversarial Network
title_full	Query is GAN: Scene Retrieval With Attentional Text-to-Image Generative Adversarial Network
title_fullStr	Query is GAN: Scene Retrieval With Attentional Text-to-Image Generative Adversarial Network
title_full_unstemmed	Query is GAN: Scene Retrieval With Attentional Text-to-Image Generative Adversarial Network
title_sort	query is gan: scene retrieval with attentional text-to-image generative adversarial network
publisher	IEEE
series	IEEE Access
issn	2169-3536
publishDate	2019-01-01
description	Scene retrieval from input descriptions has been one of the most important applications with the increasing number of videos on the Web. However, this is still a challenging task since semantic gaps between features of texts and videos exist. In this paper, we try to solve this problem by utilizing a text-to-image Generative Adversarial Network (GAN), which has become one of the most attractive research topics in recent years. The text-to-image GAN is a deep learning model that can generate images from their corresponding descriptions. We propose a new retrieval framework, “Query is GAN”, based on the text-to-image GAN that drastically improves scene retrieval performance by simple procedures. Our novel idea makes use of images generated by the text-to-image GAN as queries for the scene retrieval task. In addition, unlike many studies on text-to-image GANs that mainly focused on the generation of high-quality images, we reveal that the generated images have reasonable visual features suitable for the queries even though they are not visually pleasant. We show the effectiveness of the proposed framework through experimental evaluation in which scene retrieval is performed from real video datasets.
topic	Scene retrieval deep learning generative adversarial network text-to-image translation
url	https://ieeexplore.ieee.org/document/8868179/
work_keys_str_mv	AT rintaroyanagi queryisgansceneretrievalwithattentionaltexttoimagegenerativeadversarialnetwork AT rentogo queryisgansceneretrievalwithattentionaltexttoimagegenerativeadversarialnetwork AT takahiroogawa queryisgansceneretrievalwithattentionaltexttoimagegenerativeadversarialnetwork AT mikihaseyama queryisgansceneretrievalwithattentionaltexttoimagegenerativeadversarialnetwork
_version_	1724190114884616192

Query is GAN: Scene Retrieval With Attentional Text-to-Image Generative Adversarial Network

Similar Items