A Maximum Likelihood Estimation of Vocal-Tract-Related Filter Characteristics for Single Channel Speech Separation
<p/> <p>We present a new technique for separating two speech signals from a single recording. The proposed method bridges the gap between <it>underdetermined blind source separation</it> techniques and those techniques that model the human auditory system, that is, <it>...
Main Authors: | , , |
---|---|
Format: | Article |
Language: | English |
Published: |
SpringerOpen
2007-01-01
|
Series: | EURASIP Journal on Audio, Speech, and Music Processing |
Online Access: | http://asmp.eurasipjournals.com/content/2007/084186 |
id |
doaj-551d202f879647b6bbacb99a2ad0a47b |
---|---|
record_format |
Article |
spelling |
doaj-551d202f879647b6bbacb99a2ad0a47b2020-11-25T01:27:03ZengSpringerOpenEURASIP Journal on Audio, Speech, and Music Processing1687-47141687-47222007-01-0120071084186A Maximum Likelihood Estimation of Vocal-Tract-Related Filter Characteristics for Single Channel Speech SeparationDansereau Richard MRadfar Mohammad HSayadiyan Abolghasem<p/> <p>We present a new technique for separating two speech signals from a single recording. The proposed method bridges the gap between <it>underdetermined blind source separation</it> techniques and those techniques that model the human auditory system, that is, <it>computational auditory scene analysis</it> (CASA). For this purpose, we decompose the speech signal into the excitation signal and the vocal-tract-related filter and then estimate the components from the mixed speech using a hybrid model. We first express the probability density function (PDF) of the mixed speech's log spectral vectors in terms of the PDFs of the underlying speech signal's vocal-tract-related filters. Then, the mean vectors of PDFs of the vocal-tract-related filters are obtained using a <it>maximum likelihood</it> estimator given the mixed signal. Finally, the estimated vocal-tract-related filters along with the extracted fundamental frequencies are used to reconstruct estimates of the individual speech signals. The proposed technique effectively adds vocal-tract-related filter characteristics as a new cue to CASA models using a new grouping technique based on an underdetermined blind source separation. We compare our model with both an underdetermined blind source separation and a CASA method. The experimental results show that our model outperforms both techniques in terms of SNR improvement and the percentage of crosstalk suppression.</p> http://asmp.eurasipjournals.com/content/2007/084186 |
collection |
DOAJ |
language |
English |
format |
Article |
sources |
DOAJ |
author |
Dansereau Richard M Radfar Mohammad H Sayadiyan Abolghasem |
spellingShingle |
Dansereau Richard M Radfar Mohammad H Sayadiyan Abolghasem A Maximum Likelihood Estimation of Vocal-Tract-Related Filter Characteristics for Single Channel Speech Separation EURASIP Journal on Audio, Speech, and Music Processing |
author_facet |
Dansereau Richard M Radfar Mohammad H Sayadiyan Abolghasem |
author_sort |
Dansereau Richard M |
title |
A Maximum Likelihood Estimation of Vocal-Tract-Related Filter Characteristics for Single Channel Speech Separation |
title_short |
A Maximum Likelihood Estimation of Vocal-Tract-Related Filter Characteristics for Single Channel Speech Separation |
title_full |
A Maximum Likelihood Estimation of Vocal-Tract-Related Filter Characteristics for Single Channel Speech Separation |
title_fullStr |
A Maximum Likelihood Estimation of Vocal-Tract-Related Filter Characteristics for Single Channel Speech Separation |
title_full_unstemmed |
A Maximum Likelihood Estimation of Vocal-Tract-Related Filter Characteristics for Single Channel Speech Separation |
title_sort |
maximum likelihood estimation of vocal-tract-related filter characteristics for single channel speech separation |
publisher |
SpringerOpen |
series |
EURASIP Journal on Audio, Speech, and Music Processing |
issn |
1687-4714 1687-4722 |
publishDate |
2007-01-01 |
description |
<p/> <p>We present a new technique for separating two speech signals from a single recording. The proposed method bridges the gap between <it>underdetermined blind source separation</it> techniques and those techniques that model the human auditory system, that is, <it>computational auditory scene analysis</it> (CASA). For this purpose, we decompose the speech signal into the excitation signal and the vocal-tract-related filter and then estimate the components from the mixed speech using a hybrid model. We first express the probability density function (PDF) of the mixed speech's log spectral vectors in terms of the PDFs of the underlying speech signal's vocal-tract-related filters. Then, the mean vectors of PDFs of the vocal-tract-related filters are obtained using a <it>maximum likelihood</it> estimator given the mixed signal. Finally, the estimated vocal-tract-related filters along with the extracted fundamental frequencies are used to reconstruct estimates of the individual speech signals. The proposed technique effectively adds vocal-tract-related filter characteristics as a new cue to CASA models using a new grouping technique based on an underdetermined blind source separation. We compare our model with both an underdetermined blind source separation and a CASA method. The experimental results show that our model outperforms both techniques in terms of SNR improvement and the percentage of crosstalk suppression.</p> |
url |
http://asmp.eurasipjournals.com/content/2007/084186 |
work_keys_str_mv |
AT dansereaurichardm amaximumlikelihoodestimationofvocaltractrelatedfiltercharacteristicsforsinglechannelspeechseparation AT radfarmohammadh amaximumlikelihoodestimationofvocaltractrelatedfiltercharacteristicsforsinglechannelspeechseparation AT sayadiyanabolghasem amaximumlikelihoodestimationofvocaltractrelatedfiltercharacteristicsforsinglechannelspeechseparation AT dansereaurichardm maximumlikelihoodestimationofvocaltractrelatedfiltercharacteristicsforsinglechannelspeechseparation AT radfarmohammadh maximumlikelihoodestimationofvocaltractrelatedfiltercharacteristicsforsinglechannelspeechseparation AT sayadiyanabolghasem maximumlikelihoodestimationofvocaltractrelatedfiltercharacteristicsforsinglechannelspeechseparation |
_version_ |
1725107281620631552 |