Spatial Encoding and Multi-layer Joint Encoding Enhanced Transformer for Image Captioning

Image captioning is one of the hot research topics in the field of computer vision.It is a cross-media data analysis task that combines computer vision and natural language processing.It describes the image by understanding the content of the image and generating captions that are both semantically...

Full description

Bibliographic Details
Published in:Jisuanji kexue
Main Author: FANG Zhong-jun, ZHANG Jing, LI Dong-dong
Format: Article
Language:Chinese
Published: Editorial office of Computer Science 2022-10-01
Subjects:
Online Access:https://www.jsjkx.com/fileup/1002-137X/PDF/1002-137X-2022-49-10-151.pdf