A Maximum Likelihood Estimation of Vocal-Tract-Related Filter Characteristics for Single Channel Speech Separation

<p/> <p>We present a new technique for separating two speech signals from a single recording. The proposed method bridges the gap between <it>underdetermined blind source separation</it> techniques and those techniques that model the human auditory system, that is, <it>...

Full description

Bibliographic Details
Main Authors: Dansereau Richard M, Radfar Mohammad H, Sayadiyan Abolghasem
Format: Article
Language:English
Published: SpringerOpen 2007-01-01
Series:EURASIP Journal on Audio, Speech, and Music Processing
Online Access:http://asmp.eurasipjournals.com/content/2007/084186
id doaj-551d202f879647b6bbacb99a2ad0a47b
record_format Article
spelling doaj-551d202f879647b6bbacb99a2ad0a47b2020-11-25T01:27:03ZengSpringerOpenEURASIP Journal on Audio, Speech, and Music Processing1687-47141687-47222007-01-0120071084186A Maximum Likelihood Estimation of Vocal-Tract-Related Filter Characteristics for Single Channel Speech SeparationDansereau Richard MRadfar Mohammad HSayadiyan Abolghasem<p/> <p>We present a new technique for separating two speech signals from a single recording. The proposed method bridges the gap between <it>underdetermined blind source separation</it> techniques and those techniques that model the human auditory system, that is, <it>computational auditory scene analysis</it> (CASA). For this purpose, we decompose the speech signal into the excitation signal and the vocal-tract-related filter and then estimate the components from the mixed speech using a hybrid model. We first express the probability density function (PDF) of the mixed speech's log spectral vectors in terms of the PDFs of the underlying speech signal's vocal-tract-related filters. Then, the mean vectors of PDFs of the vocal-tract-related filters are obtained using a <it>maximum likelihood</it> estimator given the mixed signal. Finally, the estimated vocal-tract-related filters along with the extracted fundamental frequencies are used to reconstruct estimates of the individual speech signals. The proposed technique effectively adds vocal-tract-related filter characteristics as a new cue to CASA models using a new grouping technique based on an underdetermined blind source separation. We compare our model with both an underdetermined blind source separation and a CASA method. The experimental results show that our model outperforms both techniques in terms of SNR improvement and the percentage of crosstalk suppression.</p> http://asmp.eurasipjournals.com/content/2007/084186
collection DOAJ
language English
format Article
sources DOAJ
author Dansereau Richard M
Radfar Mohammad H
Sayadiyan Abolghasem
spellingShingle Dansereau Richard M
Radfar Mohammad H
Sayadiyan Abolghasem
A Maximum Likelihood Estimation of Vocal-Tract-Related Filter Characteristics for Single Channel Speech Separation
EURASIP Journal on Audio, Speech, and Music Processing
author_facet Dansereau Richard M
Radfar Mohammad H
Sayadiyan Abolghasem
author_sort Dansereau Richard M
title A Maximum Likelihood Estimation of Vocal-Tract-Related Filter Characteristics for Single Channel Speech Separation
title_short A Maximum Likelihood Estimation of Vocal-Tract-Related Filter Characteristics for Single Channel Speech Separation
title_full A Maximum Likelihood Estimation of Vocal-Tract-Related Filter Characteristics for Single Channel Speech Separation
title_fullStr A Maximum Likelihood Estimation of Vocal-Tract-Related Filter Characteristics for Single Channel Speech Separation
title_full_unstemmed A Maximum Likelihood Estimation of Vocal-Tract-Related Filter Characteristics for Single Channel Speech Separation
title_sort maximum likelihood estimation of vocal-tract-related filter characteristics for single channel speech separation
publisher SpringerOpen
series EURASIP Journal on Audio, Speech, and Music Processing
issn 1687-4714
1687-4722
publishDate 2007-01-01
description <p/> <p>We present a new technique for separating two speech signals from a single recording. The proposed method bridges the gap between <it>underdetermined blind source separation</it> techniques and those techniques that model the human auditory system, that is, <it>computational auditory scene analysis</it> (CASA). For this purpose, we decompose the speech signal into the excitation signal and the vocal-tract-related filter and then estimate the components from the mixed speech using a hybrid model. We first express the probability density function (PDF) of the mixed speech's log spectral vectors in terms of the PDFs of the underlying speech signal's vocal-tract-related filters. Then, the mean vectors of PDFs of the vocal-tract-related filters are obtained using a <it>maximum likelihood</it> estimator given the mixed signal. Finally, the estimated vocal-tract-related filters along with the extracted fundamental frequencies are used to reconstruct estimates of the individual speech signals. The proposed technique effectively adds vocal-tract-related filter characteristics as a new cue to CASA models using a new grouping technique based on an underdetermined blind source separation. We compare our model with both an underdetermined blind source separation and a CASA method. The experimental results show that our model outperforms both techniques in terms of SNR improvement and the percentage of crosstalk suppression.</p>
url http://asmp.eurasipjournals.com/content/2007/084186
work_keys_str_mv AT dansereaurichardm amaximumlikelihoodestimationofvocaltractrelatedfiltercharacteristicsforsinglechannelspeechseparation
AT radfarmohammadh amaximumlikelihoodestimationofvocaltractrelatedfiltercharacteristicsforsinglechannelspeechseparation
AT sayadiyanabolghasem amaximumlikelihoodestimationofvocaltractrelatedfiltercharacteristicsforsinglechannelspeechseparation
AT dansereaurichardm maximumlikelihoodestimationofvocaltractrelatedfiltercharacteristicsforsinglechannelspeechseparation
AT radfarmohammadh maximumlikelihoodestimationofvocaltractrelatedfiltercharacteristicsforsinglechannelspeechseparation
AT sayadiyanabolghasem maximumlikelihoodestimationofvocaltractrelatedfiltercharacteristicsforsinglechannelspeechseparation
_version_ 1725107281620631552