Digital Hearing Aid System with Audio-Vision Fused VAD

碩士 === 國立交通大學 === 電子研究所 === 105 === Hearing loss users usually need to use hearing aid device in different noisy environment. The recognition about speech will be decreased due to noise, especially the speech-like noise. Voice activity detector (VAD) is a key factor in noise reduction. The VAD accur...

Full description

Bibliographic Details
Main Authors: Zheng, Chen-Han, 鄭承翰
Other Authors: Jou, Shyh-Jye
Format: Others
Language:en_US
Published: 2016
Online Access:http://ndltd.ncl.edu.tw/handle/05747512179300497535
Description
Summary:碩士 === 國立交通大學 === 電子研究所 === 105 === Hearing loss users usually need to use hearing aid device in different noisy environment. The recognition about speech will be decreased due to noise, especially the speech-like noise. Voice activity detector (VAD) is a key factor in noise reduction. The VAD accuracy determined by audio features will decrease when the user is in low SNR or speech-like noise environment. Our previous work had utilized features of lips to assist VAD and achieved good results. Unfortunately, the previous algorithm of facial feature extraction was encrypted and had many limitation to have high accuracy. Therefore, it needs to be refined to find an alternative algorithm for more robust and flexible to do application. This thesis utilizes both lips and audio features to Support Machine Model (SVM) which is already trained. The results of SVM classifier then be judged through Keeper to smooth the results to have the final VAD. Generally, the computational complexity of image processing is high. Thus, the Audio-Visual Fused VAD (AV-VAD) embody the capacity when environment is low SNR or speech-like noise. Besides, we also have integrated the AV-VAD into the feedback cancellation with probe signal and pitch based noise reduction. In the previous work, the HA system cannot achieve real-time on Matlab. To overcome the problem, we use a PC to refine the HA system by using C++ language on Visual Studio. Moreover, we hope it can achieve real-time and return the processed sound to the user simultaneously.