Multi-Modal Explicit Sparse Attention Networks for Visual Question Answering

Visual question answering (VQA) is a multi-modal task involving natural language processing (NLP) and computer vision (CV), which requires models to understand of both visual information and textual information simultaneously to predict the correct answer for the input visual image and textual quest...

Full description

Bibliographic Details
Main Authors: Zihan Guo, Dezhi Han
Format: Article
Language:English
Published: MDPI AG 2020-11-01
Series:Sensors
Subjects:
Online Access:https://www.mdpi.com/1424-8220/20/23/6758

Similar Items