Non-Parallel Voice Conversion System With WaveNet Vocoder and Collapsed Speech Suppression

In this paper, we integrate a simple non-parallel voice conversion (VC) system with a WaveNet (WN) vocoder and a proposed collapsed speech suppression technique. The effectiveness of WN as a vocoder for generating high-fidelity speech waveforms on the basis of acoustic features has been confirmed in...

Full description

Bibliographic Details
Main Authors:	Yi-Chiao Wu, Patrick Lumban Tobing, Kazuhiro Kobayashi, Tomoki Hayashi, Tomoki Toda
Format:	Article
Language:	English
Published:	IEEE 2020-01-01
Series:	IEEE Access
Subjects:	Non-parallel voice conversion WaveNet vocoder collapsed speech segment detection linear predictive coding distribution constraint
Online Access:	https://ieeexplore.ieee.org/document/9050502/

id	doaj-25e27621ae974715a7711beb5a1a26f4
record_format	Article
spelling	doaj-25e27621ae974715a7711beb5a1a26f42021-03-30T03:07:39ZengIEEEIEEE Access2169-35362020-01-018620946210610.1109/ACCESS.2020.29840079050502Non-Parallel Voice Conversion System With WaveNet Vocoder and Collapsed Speech SuppressionYi-Chiao Wu0https://orcid.org/0000-0003-4390-1354Patrick Lumban Tobing1https://orcid.org/0000-0003-2792-8418Kazuhiro Kobayashi2Tomoki Hayashi3Tomoki Toda4Graduate School of Informatics, Nagoya University, Nagoya, JapanGraduate School of Informatics, Nagoya University, Nagoya, JapanInformation Technology Center, Nagoya University, Nagoya, JapanGraduate School of Information Science, Nagoya University, Nagoya, JapanInformation Technology Center, Nagoya University, Nagoya, JapanIn this paper, we integrate a simple non-parallel voice conversion (VC) system with a WaveNet (WN) vocoder and a proposed collapsed speech suppression technique. The effectiveness of WN as a vocoder for generating high-fidelity speech waveforms on the basis of acoustic features has been confirmed in recent works. However, when combining the WN vocoder with a VC system, the distorted acoustic features, acoustic and temporal mismatches, and exposure bias usually lead to significant speech quality degradation, making WN generate some very noisy speech segments called collapsed speech. To tackle the problem, we take conventional-vocoder-generated speech as the reference speech to derive a linear predictive coding distribution constraint (LPCDC) to avoid the collapsed speech problem. Furthermore, to mitigate the negative effects introduced by the LPCDC, we propose a collapsed speech segment detector (CSSD) to ensure that the LPCDC is only applied to the problematic segments to limit the loss of quality to short periods. Objective and subjective evaluations are conducted, and the experimental results confirm the effectiveness of the proposed method, which further improves the speech quality of our previous non-parallel VC system submitted to Voice Conversion Challenge 2018.https://ieeexplore.ieee.org/document/9050502/Non-parallel voice conversionWaveNet vocodercollapsed speech segment detectionlinear predictive coding distribution constraint
collection	DOAJ
language	English
format	Article
sources	DOAJ
author	Yi-Chiao Wu Patrick Lumban Tobing Kazuhiro Kobayashi Tomoki Hayashi Tomoki Toda
spellingShingle	Yi-Chiao Wu Patrick Lumban Tobing Kazuhiro Kobayashi Tomoki Hayashi Tomoki Toda Non-Parallel Voice Conversion System With WaveNet Vocoder and Collapsed Speech Suppression IEEE Access Non-parallel voice conversion WaveNet vocoder collapsed speech segment detection linear predictive coding distribution constraint
author_facet	Yi-Chiao Wu Patrick Lumban Tobing Kazuhiro Kobayashi Tomoki Hayashi Tomoki Toda
author_sort	Yi-Chiao Wu
title	Non-Parallel Voice Conversion System With WaveNet Vocoder and Collapsed Speech Suppression
title_short	Non-Parallel Voice Conversion System With WaveNet Vocoder and Collapsed Speech Suppression
title_full	Non-Parallel Voice Conversion System With WaveNet Vocoder and Collapsed Speech Suppression
title_fullStr	Non-Parallel Voice Conversion System With WaveNet Vocoder and Collapsed Speech Suppression
title_full_unstemmed	Non-Parallel Voice Conversion System With WaveNet Vocoder and Collapsed Speech Suppression
title_sort	non-parallel voice conversion system with wavenet vocoder and collapsed speech suppression
publisher	IEEE
series	IEEE Access
issn	2169-3536
publishDate	2020-01-01
description	In this paper, we integrate a simple non-parallel voice conversion (VC) system with a WaveNet (WN) vocoder and a proposed collapsed speech suppression technique. The effectiveness of WN as a vocoder for generating high-fidelity speech waveforms on the basis of acoustic features has been confirmed in recent works. However, when combining the WN vocoder with a VC system, the distorted acoustic features, acoustic and temporal mismatches, and exposure bias usually lead to significant speech quality degradation, making WN generate some very noisy speech segments called collapsed speech. To tackle the problem, we take conventional-vocoder-generated speech as the reference speech to derive a linear predictive coding distribution constraint (LPCDC) to avoid the collapsed speech problem. Furthermore, to mitigate the negative effects introduced by the LPCDC, we propose a collapsed speech segment detector (CSSD) to ensure that the LPCDC is only applied to the problematic segments to limit the loss of quality to short periods. Objective and subjective evaluations are conducted, and the experimental results confirm the effectiveness of the proposed method, which further improves the speech quality of our previous non-parallel VC system submitted to Voice Conversion Challenge 2018.
topic	Non-parallel voice conversion WaveNet vocoder collapsed speech segment detection linear predictive coding distribution constraint
url	https://ieeexplore.ieee.org/document/9050502/
work_keys_str_mv	AT yichiaowu nonparallelvoiceconversionsystemwithwavenetvocoderandcollapsedspeechsuppression AT patricklumbantobing nonparallelvoiceconversionsystemwithwavenetvocoderandcollapsedspeechsuppression AT kazuhirokobayashi nonparallelvoiceconversionsystemwithwavenetvocoderandcollapsedspeechsuppression AT tomokihayashi nonparallelvoiceconversionsystemwithwavenetvocoderandcollapsedspeechsuppression AT tomokitoda nonparallelvoiceconversionsystemwithwavenetvocoderandcollapsedspeechsuppression
_version_	1724183955204210688

Non-Parallel Voice Conversion System With WaveNet Vocoder and Collapsed Speech Suppression

Similar Items