Diphthong synthesis using the three-dimensional dynamic digital waveguide mesh

The human voice is a complex and nuanced instrument, and despite many years of research, no system is yet capable of producing natural-sounding synthetic speech. This affects intelligibility for some groups of listeners, in applications such as automated announcements and screen readers. Furthermore...

Full description

Bibliographic Details
Main Author: Gully, Amelia J.
Other Authors: Murphy, Damian T.
Published: University of York 2017
Subjects:
Online Access:https://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.745720
id ndltd-bl.uk-oai-ethos.bl.uk-745720
record_format oai_dc
spelling ndltd-bl.uk-oai-ethos.bl.uk-7457202019-03-05T15:57:38ZDiphthong synthesis using the three-dimensional dynamic digital waveguide meshGully, Amelia J.Murphy, Damian T.2017The human voice is a complex and nuanced instrument, and despite many years of research, no system is yet capable of producing natural-sounding synthetic speech. This affects intelligibility for some groups of listeners, in applications such as automated announcements and screen readers. Furthermore, those who require a computer to speak - due to surgery or a degenerative disease - are limited to unnatural-sounding voices that lack expressive control and may not match the user's gender, age or accent. It is evident that natural, personalised and controllable synthetic speech systems are required. A three-dimensional digital waveguide model of the vocal tract, based on magnetic resonance imaging data, is proposed here in order to address these issues. The model uses a heterogeneous digital waveguide mesh method to represent the vocal tract airway and surrounding tissues, facilitating dynamic movement and hence speech output. The accuracy of the method is validated by comparison with audio recordings of natural speech, and perceptual tests are performed which confirm that the proposed model sounds significantly more natural than simpler digital waveguide mesh vocal tract models. Control of such a model is also considered, and a proof-of-concept study is presented using a deep neural network to control the parameters of a two-dimensional vocal tract model, resulting in intelligible speech output and paving the way for extension of the control system to the proposed three-dimensional vocal tract model. Future improvements to the system are also discussed in detail. This project considers both the naturalness and control issues associated with synthetic speech and therefore represents a significant step towards improved synthetic speech for use across society.621.38University of Yorkhttps://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.745720http://etheses.whiterose.ac.uk/20043/Electronic Thesis or Dissertation
collection NDLTD
sources NDLTD
topic 621.38
spellingShingle 621.38
Gully, Amelia J.
Diphthong synthesis using the three-dimensional dynamic digital waveguide mesh
description The human voice is a complex and nuanced instrument, and despite many years of research, no system is yet capable of producing natural-sounding synthetic speech. This affects intelligibility for some groups of listeners, in applications such as automated announcements and screen readers. Furthermore, those who require a computer to speak - due to surgery or a degenerative disease - are limited to unnatural-sounding voices that lack expressive control and may not match the user's gender, age or accent. It is evident that natural, personalised and controllable synthetic speech systems are required. A three-dimensional digital waveguide model of the vocal tract, based on magnetic resonance imaging data, is proposed here in order to address these issues. The model uses a heterogeneous digital waveguide mesh method to represent the vocal tract airway and surrounding tissues, facilitating dynamic movement and hence speech output. The accuracy of the method is validated by comparison with audio recordings of natural speech, and perceptual tests are performed which confirm that the proposed model sounds significantly more natural than simpler digital waveguide mesh vocal tract models. Control of such a model is also considered, and a proof-of-concept study is presented using a deep neural network to control the parameters of a two-dimensional vocal tract model, resulting in intelligible speech output and paving the way for extension of the control system to the proposed three-dimensional vocal tract model. Future improvements to the system are also discussed in detail. This project considers both the naturalness and control issues associated with synthetic speech and therefore represents a significant step towards improved synthetic speech for use across society.
author2 Murphy, Damian T.
author_facet Murphy, Damian T.
Gully, Amelia J.
author Gully, Amelia J.
author_sort Gully, Amelia J.
title Diphthong synthesis using the three-dimensional dynamic digital waveguide mesh
title_short Diphthong synthesis using the three-dimensional dynamic digital waveguide mesh
title_full Diphthong synthesis using the three-dimensional dynamic digital waveguide mesh
title_fullStr Diphthong synthesis using the three-dimensional dynamic digital waveguide mesh
title_full_unstemmed Diphthong synthesis using the three-dimensional dynamic digital waveguide mesh
title_sort diphthong synthesis using the three-dimensional dynamic digital waveguide mesh
publisher University of York
publishDate 2017
url https://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.745720
work_keys_str_mv AT gullyameliaj diphthongsynthesisusingthethreedimensionaldynamicdigitalwaveguidemesh
_version_ 1718999214824882176