Deep Memory Fusion Model for Long Video Question Answering
Long video question answering contains rich multimodal semantic information and inference information. At present, it is difficult for video question answering models based on recurrent neural networks to fully retain important memory information, to ignore irrelevant redundant information and to ac...
| 出版年: | Journal of Harbin University of Science and Technology |
|---|---|
| 主要な著者: | , , , |
| フォーマット: | 論文 |
| 言語: | 中国語 |
| 出版事項: |
Harbin University of Science and Technology Publications
2021-02-01
|
| 主題: | |
| オンライン・アクセス: | https://hlgxb.hrbust.edu.cn/#/digest?ArticleID=1911 |
