Experimental Evaluation of Deep Learning Methods for an Intelligent Pathological Voice Detection System Using the Saarbruecken Voice Database

This work is focused on deep learning methods, such as feedforward neural network (FNN) and convolutional neural network (CNN), for pathological voice detection using mel-frequency cepstral coefficients (MFCCs), linear prediction cepstrum coefficients (LPCCs), and higher-order statistics (HOSs) para...

Full description

Bibliographic Details
Main Author:	Ji-Yeoun Lee
Format:	Article
Language:	English
Published:	MDPI AG 2021-08-01
Series:	Applied Sciences
Subjects:	pathological voice detection feedforward neural network convolution neural network deep learning higher-order statistics
Online Access:	https://www.mdpi.com/2076-3417/11/15/7149

id	doaj-1d63c7aaa161435e933ff6c650c08947
record_format	Article
spelling	doaj-1d63c7aaa161435e933ff6c650c089472021-08-06T15:19:54ZengMDPI AGApplied Sciences2076-34172021-08-01117149714910.3390/app11157149Experimental Evaluation of Deep Learning Methods for an Intelligent Pathological Voice Detection System Using the Saarbruecken Voice DatabaseJi-Yeoun Lee0Department of Biomedical Engineering, Jungwon University, 85 Munmu-ro, Goesan-eup, Goesan-gun 28024, Chungbuk-do, KoreaThis work is focused on deep learning methods, such as feedforward neural network (FNN) and convolutional neural network (CNN), for pathological voice detection using mel-frequency cepstral coefficients (MFCCs), linear prediction cepstrum coefficients (LPCCs), and higher-order statistics (HOSs) parameters. In total, 518 voice data samples were obtained from the publicly available Saarbruecken voice database (SVD), comprising recordings of 259 healthy and 259 pathological women and men, respectively, and using /a/, /i/, and /u/ vowels at normal pitch. Significant differences were observed between the normal and the pathological voice signals for normalized skewness (<i>p</i> = 0.000) and kurtosis (<i>p</i> = 0.000), except for normalized kurtosis (<i>p</i> = 0.051) that was estimated in the /u/ samples in women. These parameters are useful and meaningful for classifying pathological voice signals. The highest accuracy, 82.69%, was achieved by the CNN classifier with the LPCCs parameter in the /u/ vowel in men. The second-best performance, 80.77%, was obtained with a combination of the FNN classifier, MFCCs, and HOSs for the /i/ vowel samples in women. There was merit in combining the acoustic measures with HOS parameters for better characterization in terms of accuracy. The combination of various parameters and deep learning methods was also useful for distinguishing normal from pathological voices.https://www.mdpi.com/2076-3417/11/15/7149pathological voice detectionfeedforward neural networkconvolution neural networkdeep learninghigher-order statistics
collection	DOAJ
language	English
format	Article
sources	DOAJ
author	Ji-Yeoun Lee
spellingShingle	Ji-Yeoun Lee Experimental Evaluation of Deep Learning Methods for an Intelligent Pathological Voice Detection System Using the Saarbruecken Voice Database Applied Sciences pathological voice detection feedforward neural network convolution neural network deep learning higher-order statistics
author_facet	Ji-Yeoun Lee
author_sort	Ji-Yeoun Lee
title	Experimental Evaluation of Deep Learning Methods for an Intelligent Pathological Voice Detection System Using the Saarbruecken Voice Database
title_short	Experimental Evaluation of Deep Learning Methods for an Intelligent Pathological Voice Detection System Using the Saarbruecken Voice Database
title_full	Experimental Evaluation of Deep Learning Methods for an Intelligent Pathological Voice Detection System Using the Saarbruecken Voice Database
title_fullStr	Experimental Evaluation of Deep Learning Methods for an Intelligent Pathological Voice Detection System Using the Saarbruecken Voice Database
title_full_unstemmed	Experimental Evaluation of Deep Learning Methods for an Intelligent Pathological Voice Detection System Using the Saarbruecken Voice Database
title_sort	experimental evaluation of deep learning methods for an intelligent pathological voice detection system using the saarbruecken voice database
publisher	MDPI AG
series	Applied Sciences
issn	2076-3417
publishDate	2021-08-01
description	This work is focused on deep learning methods, such as feedforward neural network (FNN) and convolutional neural network (CNN), for pathological voice detection using mel-frequency cepstral coefficients (MFCCs), linear prediction cepstrum coefficients (LPCCs), and higher-order statistics (HOSs) parameters. In total, 518 voice data samples were obtained from the publicly available Saarbruecken voice database (SVD), comprising recordings of 259 healthy and 259 pathological women and men, respectively, and using /a/, /i/, and /u/ vowels at normal pitch. Significant differences were observed between the normal and the pathological voice signals for normalized skewness (<i>p</i> = 0.000) and kurtosis (<i>p</i> = 0.000), except for normalized kurtosis (<i>p</i> = 0.051) that was estimated in the /u/ samples in women. These parameters are useful and meaningful for classifying pathological voice signals. The highest accuracy, 82.69%, was achieved by the CNN classifier with the LPCCs parameter in the /u/ vowel in men. The second-best performance, 80.77%, was obtained with a combination of the FNN classifier, MFCCs, and HOSs for the /i/ vowel samples in women. There was merit in combining the acoustic measures with HOS parameters for better characterization in terms of accuracy. The combination of various parameters and deep learning methods was also useful for distinguishing normal from pathological voices.
topic	pathological voice detection feedforward neural network convolution neural network deep learning higher-order statistics
url	https://www.mdpi.com/2076-3417/11/15/7149
work_keys_str_mv	AT jiyeounlee experimentalevaluationofdeeplearningmethodsforanintelligentpathologicalvoicedetectionsystemusingthesaarbrueckenvoicedatabase
_version_	1721218779343486976

Experimental Evaluation of Deep Learning Methods for an Intelligent Pathological Voice Detection System Using the Saarbruecken Voice Database

Similar Items