A Multi-Modal Attentive Framework That Can Interpret Text (MMAT)

Deep learning algorithms have demonstrated exceptional performance on various computer vision and natural language processing tasks. However, for machines to learn information signals, they must understand and have enough reasoning power to respond to general questions based on the linguistic featur...

Full description

Bibliographic Details
Published in:	IEEE Access
Main Authors:	Vijay Kumari, Sarthak Gupta, Yashvardhan Sharma, Lavika Goel
Format:	Article
Language:	English
Published:	IEEE 2025-01-01
Subjects:	Visual question answering system (VQA) text visual question answering system (Text-VQA) optical character recognition (OCR) attention mechanism natural language processing (NLP)
Online Access:	https://ieeexplore.ieee.org/document/11072709/

Internet

https://ieeexplore.ieee.org/document/11072709/

A Multi-Modal Attentive Framework That Can Interpret Text (MMAT)

Internet

Similar Items