On Improvement of CNN’s Scale Invariance

碩士 === 國立交通大學 === 多媒體工程研究所 === 103 === Deep-learning-based convolutional neural network (CNN) has recently been applied widely to various image recognition tasks due to its superior ability to extract higher level features, such as objects or parts, from an image. Its performance however was found t...

Full description

Bibliographic Details
Main Authors: Sie Syuan-Yu, 謝璿羽
Other Authors: Peng Wen-Hsiao
Format: Others
Language:zh-TW
Published: 2015
Online Access:http://ndltd.ncl.edu.tw/handle/07464557609653008859
Description
Summary:碩士 === 國立交通大學 === 多媒體工程研究所 === 103 === Deep-learning-based convolutional neural network (CNN) has recently been applied widely to various image recognition tasks due to its superior ability to extract higher level features, such as objects or parts, from an image. Its performance however was found to be susceptible to image transformations, including translation, scaling, and rotation. To improve its scale invariance, this thesis takes a three-pronged approach, with aspects addressed including the structure of CNN, the training process, and the testing process. Specifically, inspired by the design of SIFT, we introduce filters of different sizes into the CNN pipeline, hoping to capture more meaningful features that may be variable in size. In the training process, we augment the training data with images of different scales, so that the weight parameters of the CNN can adapt themselves to variable-size features. During the testing stage, we pass multiple replicas of the query image transformed with cropping and/or scaling through the CNN and pool together their outputs for a more accurate prediction. Extensive experiments have been carried out to analyze the benefit from each of these enhancements and their combined effects in terms of accuracy in recognition tasks.