Learning Representative Feature Expression Algorithm and Architecture for Image andAction Recognition

碩士 === 國立臺灣大學 === 電子工程學研究所 === 103 === In the past decade, computer vision makes great progress and has signi cant impact to our daily life. Various intelligent devices are developed based on big data analysis and machine learning algorithm. Take google glasses for example, this wearable device can...

Full description

Bibliographic Details
Main Authors: Kuo-Wei Tseng, 曾國維
Other Authors: Liang-Gee Chen
Format: Others
Language:en_US
Published: 2015
Online Access:http://ndltd.ncl.edu.tw/handle/75257012226183461576
id ndltd-TW-103NTU05428112
record_format oai_dc
spelling ndltd-TW-103NTU054281122016-11-19T04:09:56Z http://ndltd.ncl.edu.tw/handle/75257012226183461576 Learning Representative Feature Expression Algorithm and Architecture for Image andAction Recognition 學習影像和動作辨識之代表性特徵值之演算法與架構設計 Kuo-Wei Tseng 曾國維 碩士 國立臺灣大學 電子工程學研究所 103 In the past decade, computer vision makes great progress and has signi cant impact to our daily life. Various intelligent devices are developed based on big data analysis and machine learning algorithm. Take google glasses for example, this wearable device can capture picture of people around you and analyze the image to recognize who they are. In some parking lot, vehicle license plate recognition system is used for automatic check-in and no more parking coin is needed to get through the gate. These applications show us the possibility to achieve a future life style by combining computer vision and machine learning. Further thinking about visual task, action recognition must be the top priority problem needed to be solved. In the near future, the intelligent robot will be invented which can interact with human-beings and do the most dangerous jobs for us. To do so, the machines must learn the meanings of images and actions, just like us. Visual tasks of videos recognition are much more complex than ones of image recognition. Videos sequence contains not only intensity and spatial information, but also temporal feature which implies the transformation between frames. With advancement of technology, the intelligent robots will be invented in the near future. Therefore, the machine vision in video domain which makes robots to learn our world is a vital issue, including action recognition. Several algorithm to deal with video tasks has been proposed in recent years, but the training procedure is too complex. In the thesis, we rst introduce some applications and common-used recognition pipeline in the eld of computer vision. A general visual recog- nition pipeline consists of three parts: (i) image/ video pre-processing, (ii) feature extraction, (iii) classi cation. In our approach, we focus on pre- processing and feature extraction part, using simple algorithm to achieve high performance. K-means clustering is broadly used for codebook generation in Bag of Visual Words (BOVW)[6] [7] method. It is known for its computational speed. The concept in [8] is to use K-means clustering not to learn the codebook from high level feature but to learn representative patches from pixel raw value. In contrast to constructing hierarchical and deep architec- ture to learn complex features, this method needs only tens of minutes to train and achieve good performance on CIFAR-10 dataset. In our approach, we extend the method from image domain to video domain, where K-means method clusters representative volumes of frames, instead of patches. How- ever, the dimensionality of volumes is much larger than the one of patches, and the size of training data in a video dataset is usually smaller than image dataset, so it is not large enough to train a good k-means model. Therefore, we proposed a method to learn volumes from different dataset to solve this problem. To sum-up, an action recognition system based on k-means cluster- ing method is designed. We can learn and extract features from different dataset. Furthermore, we propose a hardware architecture for this algo- rithm. This architecture can be both used in image/action recognition with some slight parameter changes Liang-Gee Chen 陳良基 2015 學位論文 ; thesis 55 en_US
collection NDLTD
language en_US
format Others
sources NDLTD
description 碩士 === 國立臺灣大學 === 電子工程學研究所 === 103 === In the past decade, computer vision makes great progress and has signi cant impact to our daily life. Various intelligent devices are developed based on big data analysis and machine learning algorithm. Take google glasses for example, this wearable device can capture picture of people around you and analyze the image to recognize who they are. In some parking lot, vehicle license plate recognition system is used for automatic check-in and no more parking coin is needed to get through the gate. These applications show us the possibility to achieve a future life style by combining computer vision and machine learning. Further thinking about visual task, action recognition must be the top priority problem needed to be solved. In the near future, the intelligent robot will be invented which can interact with human-beings and do the most dangerous jobs for us. To do so, the machines must learn the meanings of images and actions, just like us. Visual tasks of videos recognition are much more complex than ones of image recognition. Videos sequence contains not only intensity and spatial information, but also temporal feature which implies the transformation between frames. With advancement of technology, the intelligent robots will be invented in the near future. Therefore, the machine vision in video domain which makes robots to learn our world is a vital issue, including action recognition. Several algorithm to deal with video tasks has been proposed in recent years, but the training procedure is too complex. In the thesis, we rst introduce some applications and common-used recognition pipeline in the eld of computer vision. A general visual recog- nition pipeline consists of three parts: (i) image/ video pre-processing, (ii) feature extraction, (iii) classi cation. In our approach, we focus on pre- processing and feature extraction part, using simple algorithm to achieve high performance. K-means clustering is broadly used for codebook generation in Bag of Visual Words (BOVW)[6] [7] method. It is known for its computational speed. The concept in [8] is to use K-means clustering not to learn the codebook from high level feature but to learn representative patches from pixel raw value. In contrast to constructing hierarchical and deep architec- ture to learn complex features, this method needs only tens of minutes to train and achieve good performance on CIFAR-10 dataset. In our approach, we extend the method from image domain to video domain, where K-means method clusters representative volumes of frames, instead of patches. How- ever, the dimensionality of volumes is much larger than the one of patches, and the size of training data in a video dataset is usually smaller than image dataset, so it is not large enough to train a good k-means model. Therefore, we proposed a method to learn volumes from different dataset to solve this problem. To sum-up, an action recognition system based on k-means cluster- ing method is designed. We can learn and extract features from different dataset. Furthermore, we propose a hardware architecture for this algo- rithm. This architecture can be both used in image/action recognition with some slight parameter changes
author2 Liang-Gee Chen
author_facet Liang-Gee Chen
Kuo-Wei Tseng
曾國維
author Kuo-Wei Tseng
曾國維
spellingShingle Kuo-Wei Tseng
曾國維
Learning Representative Feature Expression Algorithm and Architecture for Image andAction Recognition
author_sort Kuo-Wei Tseng
title Learning Representative Feature Expression Algorithm and Architecture for Image andAction Recognition
title_short Learning Representative Feature Expression Algorithm and Architecture for Image andAction Recognition
title_full Learning Representative Feature Expression Algorithm and Architecture for Image andAction Recognition
title_fullStr Learning Representative Feature Expression Algorithm and Architecture for Image andAction Recognition
title_full_unstemmed Learning Representative Feature Expression Algorithm and Architecture for Image andAction Recognition
title_sort learning representative feature expression algorithm and architecture for image andaction recognition
publishDate 2015
url http://ndltd.ncl.edu.tw/handle/75257012226183461576
work_keys_str_mv AT kuoweitseng learningrepresentativefeatureexpressionalgorithmandarchitectureforimageandactionrecognition
AT céngguówéi learningrepresentativefeatureexpressionalgorithmandarchitectureforimageandactionrecognition
AT kuoweitseng xuéxíyǐngxiànghédòngzuòbiànshízhīdàibiǎoxìngtèzhēngzhízhīyǎnsuànfǎyǔjiàgòushèjì
AT céngguówéi xuéxíyǐngxiànghédòngzuòbiànshízhīdàibiǎoxìngtèzhēngzhízhīyǎnsuànfǎyǔjiàgòushèjì
_version_ 1718395074757263360