Vowel synthesis using feed-forward neural networks

This thesis is an investigation into the ability of artificial neural networks to learn to map from a symbolic representation of CVC triphones to a continuous representation of vowel formant tracks, and the influence of a number of factors on that ability. This mapping is interesting because, apart...

Full description

Bibliographic Details
Main Author: Conway, Stephen Malcolm
Published: University of Edinburgh 1994
Subjects:
Online Access:http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.643394
id ndltd-bl.uk-oai-ethos.bl.uk-643394
record_format oai_dc
spelling ndltd-bl.uk-oai-ethos.bl.uk-6433942017-04-20T03:18:50ZVowel synthesis using feed-forward neural networksConway, Stephen Malcolm1994This thesis is an investigation into the ability of artificial neural networks to learn to map from a symbolic representation of CVC triphones to a continuous representation of vowel formant tracks, and the influence of a number of factors on that ability. This mapping is interesting because, apart from being a necessary part of any text to speech system and not having any accepted definitive solution, it is from a discrete symbolic representation to a continuous non-symbolic representation. Neural networks provide one method of automatically learning such mappings and prove to be capable of doing so in this particular case. The input representation used appears to have little effect on the performance of the neural networks. A feature based representation does no better than a 1-of-n coding of the phonemes. The representation of the vowel formant tracks, produced as output of the neural networks, has a far greater effect on performance. Simple representations consisting of the initial, central and final frequencies of the formant tracks out-perform polynomial and Fourier coefficient representations which encode more information about the shape of the formant tracks. The back-propagation and conjugate gradient neural network training algorithms produced neural networks with similar performance, and the use of cross-validation made no difference in generalisation (although the cross-validation data set was far too small). Interestingly, neural networks with no hidden layer proved to be as capable of learning the mapping as those with a hidden layer, indicating that the mapping is not substantially non-linear.006.3University of Edinburghhttp://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.643394http://hdl.handle.net/1842/19643Electronic Thesis or Dissertation
collection NDLTD
sources NDLTD
topic 006.3
spellingShingle 006.3
Conway, Stephen Malcolm
Vowel synthesis using feed-forward neural networks
description This thesis is an investigation into the ability of artificial neural networks to learn to map from a symbolic representation of CVC triphones to a continuous representation of vowel formant tracks, and the influence of a number of factors on that ability. This mapping is interesting because, apart from being a necessary part of any text to speech system and not having any accepted definitive solution, it is from a discrete symbolic representation to a continuous non-symbolic representation. Neural networks provide one method of automatically learning such mappings and prove to be capable of doing so in this particular case. The input representation used appears to have little effect on the performance of the neural networks. A feature based representation does no better than a 1-of-n coding of the phonemes. The representation of the vowel formant tracks, produced as output of the neural networks, has a far greater effect on performance. Simple representations consisting of the initial, central and final frequencies of the formant tracks out-perform polynomial and Fourier coefficient representations which encode more information about the shape of the formant tracks. The back-propagation and conjugate gradient neural network training algorithms produced neural networks with similar performance, and the use of cross-validation made no difference in generalisation (although the cross-validation data set was far too small). Interestingly, neural networks with no hidden layer proved to be as capable of learning the mapping as those with a hidden layer, indicating that the mapping is not substantially non-linear.
author Conway, Stephen Malcolm
author_facet Conway, Stephen Malcolm
author_sort Conway, Stephen Malcolm
title Vowel synthesis using feed-forward neural networks
title_short Vowel synthesis using feed-forward neural networks
title_full Vowel synthesis using feed-forward neural networks
title_fullStr Vowel synthesis using feed-forward neural networks
title_full_unstemmed Vowel synthesis using feed-forward neural networks
title_sort vowel synthesis using feed-forward neural networks
publisher University of Edinburgh
publishDate 1994
url http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.643394
work_keys_str_mv AT conwaystephenmalcolm vowelsynthesisusingfeedforwardneuralnetworks
_version_ 1718439567525150720