Joint Multimodal Embedding and Backtracking Search in Vision-and-Language Navigation
Due to the development of computer vision and natural language processing technologies in recent years, there has been a growing interest in multimodal intelligent tasks that require the ability to concurrently understand various forms of input data such as images and text. Vision-and-language navig...
Main Authors: | Jisu Hwang, Incheol Kim |
---|---|
Format: | Article |
Language: | English |
Published: |
MDPI AG
2021-02-01
|
Series: | Sensors |
Subjects: | |
Online Access: | https://www.mdpi.com/1424-8220/21/3/1012 |
Similar Items
-
Vision–Language–Knowledge Co-Embedding for Visual Commonsense Reasoning
by: JaeYun Lee, et al.
Published: (2021-04-01) -
Delivering task instructions in multimodal synchronous online language teaching
by: Müge Satar, et al.
Published: (2020-09-01) -
Affect Analysis in Arabic Text: Further Pre-Training Language Models for Sentiment and Emotion
by: Alothaim, A., et al.
Published: (2023) -
The Evolution of Language Models Applied to Emotion Analysis of Arabic Tweets
by: Nora Al-Twairesh
Published: (2021-02-01) -
Vision/INS Integrated Navigation System for Poor Vision Navigation Environments
by: Youngsun Kim, et al.
Published: (2016-10-01)