Deep Memory Fusion Model for Long Video Question Answering

Long video question answering contains rich multimodal semantic information and inference information. At present, it is difficult for video question answering models based on recurrent neural networks to fully retain important memory information, to ignore irrelevant redundant information and to ac...

詳細記述

書誌詳細
出版年:	Journal of Harbin University of Science and Technology
主要な著者:	SUN Guanglu, WU Meng, QIU Jing, LIANG Lili
フォーマット:	論文
言語:	中国語
出版事項:	Harbin University of Science and Technology Publications 2021-02-01
主題:	video question answering long video understanding memory network attention mechanism multimodal fusion
オンライン･アクセス:	https://hlgxb.hrbust.edu.cn/#/digest?ArticleID=1911

インターネット

https://hlgxb.hrbust.edu.cn/#/digest?ArticleID=1911

Deep Memory Fusion Model for Long Video Question Answering

インターネット

類似資料