EMD-based method to improve the efficiency of speech/pause segmentation

Background. Speech/pause segmentation is one of the most important tasks in speech applications being accurate detection of the boundaries of the beginning and the end of voiced and unvoiced speech, and pauses. This is especially important both when analyzing distribution speed, acceleration, and...

Full description

Bibliographic Details
Main Authors:	A.K. Alimuradov, A.Yu. Tychkov, P.P. Churakov, A.V. Ageykin, A.V. Kuz'min, M.A. Mitrokhin, I.A. Chernov
Format:	Article
Language:	English
Published:	Penza State University Publishing House 2021-09-01
Series:	Известия высших учебных заведений. Поволжский регион:Технические науки
Subjects:	speech signal processing speech segmentation voiced and unvoiced speech empirical mode decomposition

id	doaj-95a48ccef92440cd808e0f15aae01e10
record_format	Article
spelling	doaj-95a48ccef92440cd808e0f15aae01e102021-09-22T11:12:42ZengPenza State University Publishing HouseИзвестия высших учебных заведений. Поволжский регион:Технические науки2072-30592021-09-01210.21685/2072-3059-2021-2-3EMD-based method to improve the efficiency of speech/pause segmentationA.K. Alimuradov0A.Yu. Tychkov1P.P. Churakov2A.V. Ageykin3A.V. Kuz'min4M.A. Mitrokhin5I.A. Chernov6Penza State UniversityPenza State UniversityPenza State UniversityPenza State UniversityPenza State UniversityPenza State UniversityPenza State UniversityBackground. Speech/pause segmentation is one of the most important tasks in speech applications being accurate detection of the boundaries of the beginning and the end of voiced and unvoiced speech, and pauses. This is especially important both when analyzing distribution speed, acceleration, and entropy of voiced and unvoiced speech sections, and pauses, and analyzing the average duration of pauses. The aim of the work is to improve the efficiency of speech/pause segmentation based on the method of empirical mode decomposition. Materials and methods. A unique technology for adaptive decomposition of non-stationary signals, namely, the improved complete ensemble empirical mode decomposition with adaptive noise, has been used in the work. The software implementation of the method was performed in ©MATLAB (MathWorks) mathematical modeling environment. Results. A decomposition-based method has been developed to be used at the preprocessing stage of the original speech signals to form a set of new investigated signals containing the most reliable information about the boundaries of the beginning and the end of the voiced and unvoiced speech, and pauses. The research to assess the influence of the decomposition method, and the duration of the studied signal fragments on the efficiency of speech/pause segmentation has been done. We have used the methods based on the analysis of zerocrossing rate, short-term energy, and one-dimensional Mahalanobis distance. Conclusions. Based on the research results, it was found that the proposed method provides an increase in the efficiency of segmentation of voiced and unvoiced speech sections: by 13.96% for the method based on the analysis of zero-crossing rate; by 8.24% for the method based on the analysis of short-term energy; by 5.72% for the method based on the combined analysis of zero-crossing rate and short-term energy; by 17.85% for the method based on the analysis of one-dimensional Mahalanobis distance.speech signal processingspeech segmentationvoiced and unvoiced speechempirical mode decomposition
collection	DOAJ
language	English
format	Article
sources	DOAJ
author	A.K. Alimuradov A.Yu. Tychkov P.P. Churakov A.V. Ageykin A.V. Kuz'min M.A. Mitrokhin I.A. Chernov
spellingShingle	A.K. Alimuradov A.Yu. Tychkov P.P. Churakov A.V. Ageykin A.V. Kuz'min M.A. Mitrokhin I.A. Chernov EMD-based method to improve the efficiency of speech/pause segmentation Известия высших учебных заведений. Поволжский регион:Технические науки speech signal processing speech segmentation voiced and unvoiced speech empirical mode decomposition
author_facet	A.K. Alimuradov A.Yu. Tychkov P.P. Churakov A.V. Ageykin A.V. Kuz'min M.A. Mitrokhin I.A. Chernov
author_sort	A.K. Alimuradov
title	EMD-based method to improve the efficiency of speech/pause segmentation
title_short	EMD-based method to improve the efficiency of speech/pause segmentation
title_full	EMD-based method to improve the efficiency of speech/pause segmentation
title_fullStr	EMD-based method to improve the efficiency of speech/pause segmentation
title_full_unstemmed	EMD-based method to improve the efficiency of speech/pause segmentation
title_sort	emd-based method to improve the efficiency of speech/pause segmentation
publisher	Penza State University Publishing House
series	Известия высших учебных заведений. Поволжский регион:Технические науки
issn	2072-3059
publishDate	2021-09-01
description	Background. Speech/pause segmentation is one of the most important tasks in speech applications being accurate detection of the boundaries of the beginning and the end of voiced and unvoiced speech, and pauses. This is especially important both when analyzing distribution speed, acceleration, and entropy of voiced and unvoiced speech sections, and pauses, and analyzing the average duration of pauses. The aim of the work is to improve the efficiency of speech/pause segmentation based on the method of empirical mode decomposition. Materials and methods. A unique technology for adaptive decomposition of non-stationary signals, namely, the improved complete ensemble empirical mode decomposition with adaptive noise, has been used in the work. The software implementation of the method was performed in ©MATLAB (MathWorks) mathematical modeling environment. Results. A decomposition-based method has been developed to be used at the preprocessing stage of the original speech signals to form a set of new investigated signals containing the most reliable information about the boundaries of the beginning and the end of the voiced and unvoiced speech, and pauses. The research to assess the influence of the decomposition method, and the duration of the studied signal fragments on the efficiency of speech/pause segmentation has been done. We have used the methods based on the analysis of zerocrossing rate, short-term energy, and one-dimensional Mahalanobis distance. Conclusions. Based on the research results, it was found that the proposed method provides an increase in the efficiency of segmentation of voiced and unvoiced speech sections: by 13.96% for the method based on the analysis of zero-crossing rate; by 8.24% for the method based on the analysis of short-term energy; by 5.72% for the method based on the combined analysis of zero-crossing rate and short-term energy; by 17.85% for the method based on the analysis of one-dimensional Mahalanobis distance.
topic	speech signal processing speech segmentation voiced and unvoiced speech empirical mode decomposition
work_keys_str_mv	AT akalimuradov emdbasedmethodtoimprovetheefficiencyofspeechpausesegmentation AT ayutychkov emdbasedmethodtoimprovetheefficiencyofspeechpausesegmentation AT ppchurakov emdbasedmethodtoimprovetheefficiencyofspeechpausesegmentation AT avageykin emdbasedmethodtoimprovetheefficiencyofspeechpausesegmentation AT avkuzmin emdbasedmethodtoimprovetheefficiencyofspeechpausesegmentation AT mamitrokhin emdbasedmethodtoimprovetheefficiencyofspeechpausesegmentation AT iachernov emdbasedmethodtoimprovetheefficiencyofspeechpausesegmentation
_version_	1717371513892503552

EMD-based method to improve the efficiency of speech/pause segmentation

Similar Items