EMD-based method to improve the efficiency of speech/pause segmentation

Background. Speech/pause segmentation is one of the most important tasks in speech applications being accurate detection of the boundaries of the beginning and the end of voiced and unvoiced speech, and pauses. This is especially important both when analyzing distribution speed, acceleration, and...

Full description

Bibliographic Details
Main Authors: A.K. Alimuradov, A.Yu. Tychkov, P.P. Churakov, A.V. Ageykin, A.V. Kuz'min, M.A. Mitrokhin, I.A. Chernov
Format: Article
Language:English
Published: Penza State University Publishing House 2021-09-01
Series:Известия высших учебных заведений. Поволжский регион:Технические науки
Subjects:
id doaj-95a48ccef92440cd808e0f15aae01e10
record_format Article
spelling doaj-95a48ccef92440cd808e0f15aae01e102021-09-22T11:12:42ZengPenza State University Publishing HouseИзвестия высших учебных заведений. Поволжский регион:Технические науки2072-30592021-09-01210.21685/2072-3059-2021-2-3EMD-based method to improve the efficiency of speech/pause segmentationA.K. Alimuradov0A.Yu. Tychkov1P.P. Churakov2A.V. Ageykin3A.V. Kuz'min4M.A. Mitrokhin5I.A. Chernov6Penza State UniversityPenza State UniversityPenza State UniversityPenza State UniversityPenza State UniversityPenza State UniversityPenza State UniversityBackground. Speech/pause segmentation is one of the most important tasks in speech applications being accurate detection of the boundaries of the beginning and the end of voiced and unvoiced speech, and pauses. This is especially important both when analyzing distribution speed, acceleration, and entropy of voiced and unvoiced speech sections, and pauses, and analyzing the average duration of pauses. The aim of the work is to improve the efficiency of speech/pause segmentation based on the method of empirical mode decomposition. Materials and methods. A unique technology for adaptive decomposition of non-stationary signals, namely, the improved complete ensemble empirical mode decomposition with adaptive noise, has been used in the work. The software implementation of the method was performed in ©MATLAB (MathWorks) mathematical modeling environment. Results. A decomposition-based method has been developed to be used at the preprocessing stage of the original speech signals to form a set of new investigated signals containing the most reliable information about the boundaries of the beginning and the end of the voiced and unvoiced speech, and pauses. The research to assess the influence of the decomposition method, and the duration of the studied signal fragments on the efficiency of speech/pause segmentation has been done. We have used the methods based on the analysis of zerocrossing rate, short-term energy, and one-dimensional Mahalanobis distance. Conclusions. Based on the research results, it was found that the proposed method provides an increase in the efficiency of segmentation of voiced and unvoiced speech sections: by 13.96% for the method based on the analysis of zero-crossing rate; by 8.24% for the method based on the analysis of short-term energy; by 5.72% for the method based on the combined analysis of zero-crossing rate and short-term energy; by 17.85% for the method based on the analysis of one-dimensional Mahalanobis distance.speech signal processingspeech segmentationvoiced and unvoiced speechempirical mode decomposition
collection DOAJ
language English
format Article
sources DOAJ
author A.K. Alimuradov
A.Yu. Tychkov
P.P. Churakov
A.V. Ageykin
A.V. Kuz'min
M.A. Mitrokhin
I.A. Chernov
spellingShingle A.K. Alimuradov
A.Yu. Tychkov
P.P. Churakov
A.V. Ageykin
A.V. Kuz'min
M.A. Mitrokhin
I.A. Chernov
EMD-based method to improve the efficiency of speech/pause segmentation
Известия высших учебных заведений. Поволжский регион:Технические науки
speech signal processing
speech segmentation
voiced and unvoiced speech
empirical mode decomposition
author_facet A.K. Alimuradov
A.Yu. Tychkov
P.P. Churakov
A.V. Ageykin
A.V. Kuz'min
M.A. Mitrokhin
I.A. Chernov
author_sort A.K. Alimuradov
title EMD-based method to improve the efficiency of speech/pause segmentation
title_short EMD-based method to improve the efficiency of speech/pause segmentation
title_full EMD-based method to improve the efficiency of speech/pause segmentation
title_fullStr EMD-based method to improve the efficiency of speech/pause segmentation
title_full_unstemmed EMD-based method to improve the efficiency of speech/pause segmentation
title_sort emd-based method to improve the efficiency of speech/pause segmentation
publisher Penza State University Publishing House
series Известия высших учебных заведений. Поволжский регион:Технические науки
issn 2072-3059
publishDate 2021-09-01
description Background. Speech/pause segmentation is one of the most important tasks in speech applications being accurate detection of the boundaries of the beginning and the end of voiced and unvoiced speech, and pauses. This is especially important both when analyzing distribution speed, acceleration, and entropy of voiced and unvoiced speech sections, and pauses, and analyzing the average duration of pauses. The aim of the work is to improve the efficiency of speech/pause segmentation based on the method of empirical mode decomposition. Materials and methods. A unique technology for adaptive decomposition of non-stationary signals, namely, the improved complete ensemble empirical mode decomposition with adaptive noise, has been used in the work. The software implementation of the method was performed in ©MATLAB (MathWorks) mathematical modeling environment. Results. A decomposition-based method has been developed to be used at the preprocessing stage of the original speech signals to form a set of new investigated signals containing the most reliable information about the boundaries of the beginning and the end of the voiced and unvoiced speech, and pauses. The research to assess the influence of the decomposition method, and the duration of the studied signal fragments on the efficiency of speech/pause segmentation has been done. We have used the methods based on the analysis of zerocrossing rate, short-term energy, and one-dimensional Mahalanobis distance. Conclusions. Based on the research results, it was found that the proposed method provides an increase in the efficiency of segmentation of voiced and unvoiced speech sections: by 13.96% for the method based on the analysis of zero-crossing rate; by 8.24% for the method based on the analysis of short-term energy; by 5.72% for the method based on the combined analysis of zero-crossing rate and short-term energy; by 17.85% for the method based on the analysis of one-dimensional Mahalanobis distance.
topic speech signal processing
speech segmentation
voiced and unvoiced speech
empirical mode decomposition
work_keys_str_mv AT akalimuradov emdbasedmethodtoimprovetheefficiencyofspeechpausesegmentation
AT ayutychkov emdbasedmethodtoimprovetheefficiencyofspeechpausesegmentation
AT ppchurakov emdbasedmethodtoimprovetheefficiencyofspeechpausesegmentation
AT avageykin emdbasedmethodtoimprovetheefficiencyofspeechpausesegmentation
AT avkuzmin emdbasedmethodtoimprovetheefficiencyofspeechpausesegmentation
AT mamitrokhin emdbasedmethodtoimprovetheefficiencyofspeechpausesegmentation
AT iachernov emdbasedmethodtoimprovetheefficiencyofspeechpausesegmentation
_version_ 1717371513892503552