EMD-based method to improve the efficiency of speech/pause segmentation
Background. Speech/pause segmentation is one of the most important tasks in speech applications being accurate detection of the boundaries of the beginning and the end of voiced and unvoiced speech, and pauses. This is especially important both when analyzing distribution speed, acceleration, and...
Main Authors: | , , , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Penza State University Publishing House
2021-09-01
|
Series: | Известия высших учебных заведений. Поволжский регион:Технические науки |
Subjects: |
id |
doaj-95a48ccef92440cd808e0f15aae01e10 |
---|---|
record_format |
Article |
spelling |
doaj-95a48ccef92440cd808e0f15aae01e102021-09-22T11:12:42ZengPenza State University Publishing HouseИзвестия высших учебных заведений. Поволжский регион:Технические науки2072-30592021-09-01210.21685/2072-3059-2021-2-3EMD-based method to improve the efficiency of speech/pause segmentationA.K. Alimuradov0A.Yu. Tychkov1P.P. Churakov2A.V. Ageykin3A.V. Kuz'min4M.A. Mitrokhin5I.A. Chernov6Penza State UniversityPenza State UniversityPenza State UniversityPenza State UniversityPenza State UniversityPenza State UniversityPenza State UniversityBackground. Speech/pause segmentation is one of the most important tasks in speech applications being accurate detection of the boundaries of the beginning and the end of voiced and unvoiced speech, and pauses. This is especially important both when analyzing distribution speed, acceleration, and entropy of voiced and unvoiced speech sections, and pauses, and analyzing the average duration of pauses. The aim of the work is to improve the efficiency of speech/pause segmentation based on the method of empirical mode decomposition. Materials and methods. A unique technology for adaptive decomposition of non-stationary signals, namely, the improved complete ensemble empirical mode decomposition with adaptive noise, has been used in the work. The software implementation of the method was performed in ©MATLAB (MathWorks) mathematical modeling environment. Results. A decomposition-based method has been developed to be used at the preprocessing stage of the original speech signals to form a set of new investigated signals containing the most reliable information about the boundaries of the beginning and the end of the voiced and unvoiced speech, and pauses. The research to assess the influence of the decomposition method, and the duration of the studied signal fragments on the efficiency of speech/pause segmentation has been done. We have used the methods based on the analysis of zerocrossing rate, short-term energy, and one-dimensional Mahalanobis distance. Conclusions. Based on the research results, it was found that the proposed method provides an increase in the efficiency of segmentation of voiced and unvoiced speech sections: by 13.96% for the method based on the analysis of zero-crossing rate; by 8.24% for the method based on the analysis of short-term energy; by 5.72% for the method based on the combined analysis of zero-crossing rate and short-term energy; by 17.85% for the method based on the analysis of one-dimensional Mahalanobis distance.speech signal processingspeech segmentationvoiced and unvoiced speechempirical mode decomposition |
collection |
DOAJ |
language |
English |
format |
Article |
sources |
DOAJ |
author |
A.K. Alimuradov A.Yu. Tychkov P.P. Churakov A.V. Ageykin A.V. Kuz'min M.A. Mitrokhin I.A. Chernov |
spellingShingle |
A.K. Alimuradov A.Yu. Tychkov P.P. Churakov A.V. Ageykin A.V. Kuz'min M.A. Mitrokhin I.A. Chernov EMD-based method to improve the efficiency of speech/pause segmentation Известия высших учебных заведений. Поволжский регион:Технические науки speech signal processing speech segmentation voiced and unvoiced speech empirical mode decomposition |
author_facet |
A.K. Alimuradov A.Yu. Tychkov P.P. Churakov A.V. Ageykin A.V. Kuz'min M.A. Mitrokhin I.A. Chernov |
author_sort |
A.K. Alimuradov |
title |
EMD-based method to improve the efficiency of speech/pause segmentation |
title_short |
EMD-based method to improve the efficiency of speech/pause segmentation |
title_full |
EMD-based method to improve the efficiency of speech/pause segmentation |
title_fullStr |
EMD-based method to improve the efficiency of speech/pause segmentation |
title_full_unstemmed |
EMD-based method to improve the efficiency of speech/pause segmentation |
title_sort |
emd-based method to improve the efficiency of speech/pause segmentation |
publisher |
Penza State University Publishing House |
series |
Известия высших учебных заведений. Поволжский регион:Технические науки |
issn |
2072-3059 |
publishDate |
2021-09-01 |
description |
Background. Speech/pause segmentation is one of the most important tasks in
speech applications being accurate detection of the boundaries of the beginning and the end
of voiced and unvoiced speech, and pauses. This is especially important both when analyzing
distribution speed, acceleration, and entropy of voiced and unvoiced speech sections,
and pauses, and analyzing the average duration of pauses. The aim of the work is to improve
the efficiency of speech/pause segmentation based on the method of empirical mode
decomposition. Materials and methods. A unique technology for adaptive decomposition of
non-stationary signals, namely, the improved complete ensemble empirical mode decomposition
with adaptive noise, has been used in the work. The software implementation of the
method was performed in ©MATLAB (MathWorks) mathematical modeling environment.
Results. A decomposition-based method has been developed to be used at the preprocessing
stage of the original speech signals to form a set of new investigated signals containing the
most reliable information about the boundaries of the beginning and the end of the voiced
and unvoiced speech, and pauses. The research to assess the influence of the decomposition
method, and the duration of the studied signal fragments on the efficiency of speech/pause
segmentation has been done. We have used the methods based on the analysis of zerocrossing
rate, short-term energy, and one-dimensional Mahalanobis distance. Conclusions.
Based on the research results, it was found that the proposed method provides an increase
in the efficiency of segmentation of voiced and unvoiced speech sections: by 13.96% for
the method based on the analysis of zero-crossing rate; by 8.24% for the method based on
the analysis of short-term energy; by 5.72% for the method based on the combined analysis
of zero-crossing rate and short-term energy; by 17.85% for the method based on the analysis
of one-dimensional Mahalanobis distance. |
topic |
speech signal processing speech segmentation voiced and unvoiced speech empirical mode decomposition |
work_keys_str_mv |
AT akalimuradov emdbasedmethodtoimprovetheefficiencyofspeechpausesegmentation AT ayutychkov emdbasedmethodtoimprovetheefficiencyofspeechpausesegmentation AT ppchurakov emdbasedmethodtoimprovetheefficiencyofspeechpausesegmentation AT avageykin emdbasedmethodtoimprovetheefficiencyofspeechpausesegmentation AT avkuzmin emdbasedmethodtoimprovetheefficiencyofspeechpausesegmentation AT mamitrokhin emdbasedmethodtoimprovetheefficiencyofspeechpausesegmentation AT iachernov emdbasedmethodtoimprovetheefficiencyofspeechpausesegmentation |
_version_ |
1717371513892503552 |