Applying K-Nearest Neighbor and Convolutional Neural Network pooling on Mandarin Vowel Recognition

碩士 === 國立中興大學 === 統計學研究所 === 107 === In the past 10 years, with the rise of the artificial neural network, machine learning has made rapid advance in speech and image recognition, while Chinese pronunciation can be divided into two parts: vowel and consonant. Take <ㄈㄚˇ>for example,<ㄈ>is conson...

Full description

Bibliographic Details
Main Authors: You-Cheng Lin, 林祐丞
Other Authors: 李宗寶
Format: Others
Language:zh-TW
Published: 2019
Online Access:http://ndltd.ncl.edu.tw/cgi-bin/gs32/gsweb.cgi/login?o=dnclcdr&s=id=%22107NCHU5337011%22.&searchmode=basic
id ndltd-TW-107NCHU5337011
record_format oai_dc
spelling ndltd-TW-107NCHU53370112019-11-30T06:09:39Z http://ndltd.ncl.edu.tw/cgi-bin/gs32/gsweb.cgi/login?o=dnclcdr&s=id=%22107NCHU5337011%22.&searchmode=basic Applying K-Nearest Neighbor and Convolutional Neural Network pooling on Mandarin Vowel Recognition 最近鄰居法與卷積神經網路池化對中文母音辨識之探討 You-Cheng Lin 林祐丞 碩士 國立中興大學 統計學研究所 107 In the past 10 years, with the rise of the artificial neural network, machine learning has made rapid advance in speech and image recognition, while Chinese pronunciation can be divided into two parts: vowel and consonant. Take <ㄈㄚˇ>for example,<ㄈ>is consonant and <ㄚˇ> is vowel . There are 160 types of vowels, 36 types of consonants, and 5 tones which compose 1,391 Chinese pronunciation. In this paper, the k-nearest neighbor(KNN) method and the convolutional neural network(CNN) are used to identify the vowel. The data is recorded from 20 speakers. After sampling, endpoint detection, and frame cutting, the dimension of parameter matrix of each pronunciation data is 53x39, and then attempts to use k-nearest neighbor method , and the convolutional neural network model to identify different hyperparameters with non-pooling, maximum pooling, and average pooling respectively, and explores the effects of various combinations on the accuracy of identification. Because CNN is nonlinear fitting, the recognition rate is much higher than KNN which is linear. The hyperparameter of CNN's highest resolution is 45 frame, kernel size is 5x5, and number of kernel is (512. , 1024, 2048), and using the average pooling, the recognition rate of the vowel can reach 0.9647. 李宗寶 2019 學位論文 ; thesis 21 zh-TW
collection NDLTD
language zh-TW
format Others
sources NDLTD
description 碩士 === 國立中興大學 === 統計學研究所 === 107 === In the past 10 years, with the rise of the artificial neural network, machine learning has made rapid advance in speech and image recognition, while Chinese pronunciation can be divided into two parts: vowel and consonant. Take <ㄈㄚˇ>for example,<ㄈ>is consonant and <ㄚˇ> is vowel . There are 160 types of vowels, 36 types of consonants, and 5 tones which compose 1,391 Chinese pronunciation. In this paper, the k-nearest neighbor(KNN) method and the convolutional neural network(CNN) are used to identify the vowel. The data is recorded from 20 speakers. After sampling, endpoint detection, and frame cutting, the dimension of parameter matrix of each pronunciation data is 53x39, and then attempts to use k-nearest neighbor method , and the convolutional neural network model to identify different hyperparameters with non-pooling, maximum pooling, and average pooling respectively, and explores the effects of various combinations on the accuracy of identification. Because CNN is nonlinear fitting, the recognition rate is much higher than KNN which is linear. The hyperparameter of CNN's highest resolution is 45 frame, kernel size is 5x5, and number of kernel is (512. , 1024, 2048), and using the average pooling, the recognition rate of the vowel can reach 0.9647.
author2 李宗寶
author_facet 李宗寶
You-Cheng Lin
林祐丞
author You-Cheng Lin
林祐丞
spellingShingle You-Cheng Lin
林祐丞
Applying K-Nearest Neighbor and Convolutional Neural Network pooling on Mandarin Vowel Recognition
author_sort You-Cheng Lin
title Applying K-Nearest Neighbor and Convolutional Neural Network pooling on Mandarin Vowel Recognition
title_short Applying K-Nearest Neighbor and Convolutional Neural Network pooling on Mandarin Vowel Recognition
title_full Applying K-Nearest Neighbor and Convolutional Neural Network pooling on Mandarin Vowel Recognition
title_fullStr Applying K-Nearest Neighbor and Convolutional Neural Network pooling on Mandarin Vowel Recognition
title_full_unstemmed Applying K-Nearest Neighbor and Convolutional Neural Network pooling on Mandarin Vowel Recognition
title_sort applying k-nearest neighbor and convolutional neural network pooling on mandarin vowel recognition
publishDate 2019
url http://ndltd.ncl.edu.tw/cgi-bin/gs32/gsweb.cgi/login?o=dnclcdr&s=id=%22107NCHU5337011%22.&searchmode=basic
work_keys_str_mv AT youchenglin applyingknearestneighborandconvolutionalneuralnetworkpoolingonmandarinvowelrecognition
AT línyòuchéng applyingknearestneighborandconvolutionalneuralnetworkpoolingonmandarinvowelrecognition
AT youchenglin zuìjìnlínjūfǎyǔjuǎnjīshénjīngwǎnglùchíhuàduìzhōngwénmǔyīnbiànshízhītàntǎo
AT línyòuchéng zuìjìnlínjūfǎyǔjuǎnjīshénjīngwǎnglùchíhuàduìzhōngwénmǔyīnbiànshízhītàntǎo
_version_ 1719300442630914048