Dual modality prompt learning for visual question-grounded answering in robotic surgery

Abstract With recent advancements in robotic surgery, notable strides have been made in visual question answering (VQA). Existing VQA systems typically generate textual answers to questions but fail to indicate the location of the relevant content within the image. This limitation restricts the inte...

詳細記述

書誌詳細
出版年:Visual Computing for Industry, Biomedicine, and Art
主要な著者: Yue Zhang, Wanshu Fan, Peixi Peng, Xin Yang, Dongsheng Zhou, Xiaopeng Wei
フォーマット: 論文
言語:英語
出版事項: SpringerOpen 2024-04-01
主題:
オンライン・アクセス:https://doi.org/10.1186/s42492-024-00160-z