Multimodal Encoder-Decoder Attention Networks for Visual Question Answering

Visual Question Answering (VQA) is a multimodal task involving Computer Vision (CV) and Natural Language Processing (NLP), the goal is to establish a high-efficiency VQA model. Learning a fine-grained and simultaneous understanding of both the visual content of images and the textual content of ques...

Full description

Bibliographic Details
Main Authors: Chongqing Chen, Dezhi Han, Jun Wang
Format: Article
Language:English
Published: IEEE 2020-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/9003229/