A Benchmark Dataset and Learning High-Level Semantic Embeddings of Multimedia for Cross-Media Retrieval

The selection of semantic concepts for modal construction and data collection remains an open research issue. It is highly demanding to choose good multimedia concepts with small semantic gaps to facilitate the work of cross-media system developers. However, very little work has been done in this ar...

Full description

Bibliographic Details
Main Authors: Sadaqat Ur Rehman, Shanshan Tu, Yongfeng Huang, Obaid Ur Rehman
Format: Article
Language:English
Published: IEEE 2018-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/8516912/
id doaj-3cffecbd8e4f41e983437b0ee84bfc17
record_format Article
spelling doaj-3cffecbd8e4f41e983437b0ee84bfc172021-03-29T20:27:57ZengIEEEIEEE Access2169-35362018-01-016671766718810.1109/ACCESS.2018.28788688516912A Benchmark Dataset and Learning High-Level Semantic Embeddings of Multimedia for Cross-Media RetrievalSadaqat Ur Rehman0https://orcid.org/0000-0002-4449-1708Shanshan Tu1Yongfeng Huang2Obaid Ur Rehman3https://orcid.org/0000-0003-4577-6059Tsinghua National Laboratory for Information Science and Technology, Tsinghua University, Beijing, ChinaFaculty of Information Technology, Beijing University of Technology, Beijing, ChinaTsinghua National Laboratory for Information Science and Technology, Tsinghua University, Beijing, ChinaSarhad University of Science and Information Technology, Peshawar, PakistanThe selection of semantic concepts for modal construction and data collection remains an open research issue. It is highly demanding to choose good multimedia concepts with small semantic gaps to facilitate the work of cross-media system developers. However, very little work has been done in this area. This paper contributes a new, real-world web image dataset for cross-media retrieval called FB5K. The proposed FB5K dataset contains the following attributes: 1) 5130 images crawled from Facebook; 2) images that are categorized according to users’ feelings; 3) images independent of text and language rather than using feelings for search. Furthermore, we propose a novel approach through the use of Optical Character Recognition and explicit incorporation of high-level semantic information. We comprehensively compute the performance of four different subspace-learning methods and three modified versions of the Correspondence Auto Encoder, alongside numerous text features and similarity measurements comparing Wikipedia, Flickr30k, and FB5K. To check the characteristics of FB5K, we propose a semantic-based cross-media retrieval method. To accomplish cross-media retrieval, we introduced a new similarity measurement in the embedded space, which significantly improved system performance compared with the conventional Euclidean distance. Our experimental results demonstrated the efficiency of the proposed retrieval method on three different datasets to simplify and improve general image retrieval.https://ieeexplore.ieee.org/document/8516912/Cross-media retrievalFB5K datasethigh-level semantic embeddings
collection DOAJ
language English
format Article
sources DOAJ
author Sadaqat Ur Rehman
Shanshan Tu
Yongfeng Huang
Obaid Ur Rehman
spellingShingle Sadaqat Ur Rehman
Shanshan Tu
Yongfeng Huang
Obaid Ur Rehman
A Benchmark Dataset and Learning High-Level Semantic Embeddings of Multimedia for Cross-Media Retrieval
IEEE Access
Cross-media retrieval
FB5K dataset
high-level semantic embeddings
author_facet Sadaqat Ur Rehman
Shanshan Tu
Yongfeng Huang
Obaid Ur Rehman
author_sort Sadaqat Ur Rehman
title A Benchmark Dataset and Learning High-Level Semantic Embeddings of Multimedia for Cross-Media Retrieval
title_short A Benchmark Dataset and Learning High-Level Semantic Embeddings of Multimedia for Cross-Media Retrieval
title_full A Benchmark Dataset and Learning High-Level Semantic Embeddings of Multimedia for Cross-Media Retrieval
title_fullStr A Benchmark Dataset and Learning High-Level Semantic Embeddings of Multimedia for Cross-Media Retrieval
title_full_unstemmed A Benchmark Dataset and Learning High-Level Semantic Embeddings of Multimedia for Cross-Media Retrieval
title_sort benchmark dataset and learning high-level semantic embeddings of multimedia for cross-media retrieval
publisher IEEE
series IEEE Access
issn 2169-3536
publishDate 2018-01-01
description The selection of semantic concepts for modal construction and data collection remains an open research issue. It is highly demanding to choose good multimedia concepts with small semantic gaps to facilitate the work of cross-media system developers. However, very little work has been done in this area. This paper contributes a new, real-world web image dataset for cross-media retrieval called FB5K. The proposed FB5K dataset contains the following attributes: 1) 5130 images crawled from Facebook; 2) images that are categorized according to users’ feelings; 3) images independent of text and language rather than using feelings for search. Furthermore, we propose a novel approach through the use of Optical Character Recognition and explicit incorporation of high-level semantic information. We comprehensively compute the performance of four different subspace-learning methods and three modified versions of the Correspondence Auto Encoder, alongside numerous text features and similarity measurements comparing Wikipedia, Flickr30k, and FB5K. To check the characteristics of FB5K, we propose a semantic-based cross-media retrieval method. To accomplish cross-media retrieval, we introduced a new similarity measurement in the embedded space, which significantly improved system performance compared with the conventional Euclidean distance. Our experimental results demonstrated the efficiency of the proposed retrieval method on three different datasets to simplify and improve general image retrieval.
topic Cross-media retrieval
FB5K dataset
high-level semantic embeddings
url https://ieeexplore.ieee.org/document/8516912/
work_keys_str_mv AT sadaqaturrehman abenchmarkdatasetandlearninghighlevelsemanticembeddingsofmultimediaforcrossmediaretrieval
AT shanshantu abenchmarkdatasetandlearninghighlevelsemanticembeddingsofmultimediaforcrossmediaretrieval
AT yongfenghuang abenchmarkdatasetandlearninghighlevelsemanticembeddingsofmultimediaforcrossmediaretrieval
AT obaidurrehman abenchmarkdatasetandlearninghighlevelsemanticembeddingsofmultimediaforcrossmediaretrieval
AT sadaqaturrehman benchmarkdatasetandlearninghighlevelsemanticembeddingsofmultimediaforcrossmediaretrieval
AT shanshantu benchmarkdatasetandlearninghighlevelsemanticembeddingsofmultimediaforcrossmediaretrieval
AT yongfenghuang benchmarkdatasetandlearninghighlevelsemanticembeddingsofmultimediaforcrossmediaretrieval
AT obaidurrehman benchmarkdatasetandlearninghighlevelsemanticembeddingsofmultimediaforcrossmediaretrieval
_version_ 1724194774463807488