Image Caption with Object Detection and Weighted Feature Fusion

碩士 === 國立交通大學 === 電控工程研究所 === 105 === Automatically describing the content of images connects computer vision and natural language processing. This thesis combines object detection with image caption to obtain better feature representations. The feature fusion proposes a simple weighting determinati...

Full description

Bibliographic Details
Main Authors: Tseng, Wan-Ju, 曾婉茹
Other Authors: Wu, Bing-Fei
Format: Others
Language:zh-TW
Published: 2017
Online Access:http://ndltd.ncl.edu.tw/handle/2ja7m6
Description
Summary:碩士 === 國立交通大學 === 電控工程研究所 === 105 === Automatically describing the content of images connects computer vision and natural language processing. This thesis combines object detection with image caption to obtain better feature representations. The feature fusion proposes a simple weighting determination technique relying on only bounding box attributes to sum up local and global features. In addition, object coordinates predict the relations for building a bridge of human interactions. The model is based on a novel combination of Convolutional Neural Networks and Recurrent Neural Networks over regions in interests and sentences respectively, inserting objective embedding in the middle network layer for reducing internal covariate shift and inferring compressed features. In this paper, the system is evaluated on MS COCO dataset, which comprises 123,287 images and 616,435 descriptions. Also, experiments show BLEU4, METEOR, ROUGEL, and CIDEr score improvements while sustain 26 FPS real-time performance.