Vowel synthesis using feed-forward neural networks

This thesis is an investigation into the ability of artificial neural networks to learn to map from a symbolic representation of CVC triphones to a continuous representation of vowel formant tracks, and the influence of a number of factors on that ability. This mapping is interesting because, apart...

Full description

Bibliographic Details
Main Author:	Conway, Stephen Malcolm
Published:	University of Edinburgh 1994
Subjects:	006.3
Online Access:	http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.643394

id	ndltd-bl.uk-oai-ethos.bl.uk-643394
record_format	oai_dc
spelling	ndltd-bl.uk-oai-ethos.bl.uk-6433942017-04-20T03:18:50ZVowel synthesis using feed-forward neural networksConway, Stephen Malcolm1994This thesis is an investigation into the ability of artificial neural networks to learn to map from a symbolic representation of CVC triphones to a continuous representation of vowel formant tracks, and the influence of a number of factors on that ability. This mapping is interesting because, apart from being a necessary part of any text to speech system and not having any accepted definitive solution, it is from a discrete symbolic representation to a continuous non-symbolic representation. Neural networks provide one method of automatically learning such mappings and prove to be capable of doing so in this particular case. The input representation used appears to have little effect on the performance of the neural networks. A feature based representation does no better than a 1-of-n coding of the phonemes. The representation of the vowel formant tracks, produced as output of the neural networks, has a far greater effect on performance. Simple representations consisting of the initial, central and final frequencies of the formant tracks out-perform polynomial and Fourier coefficient representations which encode more information about the shape of the formant tracks. The back-propagation and conjugate gradient neural network training algorithms produced neural networks with similar performance, and the use of cross-validation made no difference in generalisation (although the cross-validation data set was far too small). Interestingly, neural networks with no hidden layer proved to be as capable of learning the mapping as those with a hidden layer, indicating that the mapping is not substantially non-linear.006.3University of Edinburghhttp://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.643394http://hdl.handle.net/1842/19643Electronic Thesis or Dissertation
collection	NDLTD
sources	NDLTD
topic	006.3
spellingShingle	006.3 Conway, Stephen Malcolm Vowel synthesis using feed-forward neural networks
description	This thesis is an investigation into the ability of artificial neural networks to learn to map from a symbolic representation of CVC triphones to a continuous representation of vowel formant tracks, and the influence of a number of factors on that ability. This mapping is interesting because, apart from being a necessary part of any text to speech system and not having any accepted definitive solution, it is from a discrete symbolic representation to a continuous non-symbolic representation. Neural networks provide one method of automatically learning such mappings and prove to be capable of doing so in this particular case. The input representation used appears to have little effect on the performance of the neural networks. A feature based representation does no better than a 1-of-n coding of the phonemes. The representation of the vowel formant tracks, produced as output of the neural networks, has a far greater effect on performance. Simple representations consisting of the initial, central and final frequencies of the formant tracks out-perform polynomial and Fourier coefficient representations which encode more information about the shape of the formant tracks. The back-propagation and conjugate gradient neural network training algorithms produced neural networks with similar performance, and the use of cross-validation made no difference in generalisation (although the cross-validation data set was far too small). Interestingly, neural networks with no hidden layer proved to be as capable of learning the mapping as those with a hidden layer, indicating that the mapping is not substantially non-linear.
author	Conway, Stephen Malcolm
author_facet	Conway, Stephen Malcolm
author_sort	Conway, Stephen Malcolm
title	Vowel synthesis using feed-forward neural networks
title_short	Vowel synthesis using feed-forward neural networks
title_full	Vowel synthesis using feed-forward neural networks
title_fullStr	Vowel synthesis using feed-forward neural networks
title_full_unstemmed	Vowel synthesis using feed-forward neural networks
title_sort	vowel synthesis using feed-forward neural networks
publisher	University of Edinburgh
publishDate	1994
url	http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.643394
work_keys_str_mv	AT conwaystephenmalcolm vowelsynthesisusingfeedforwardneuralnetworks
_version_	1718439567525150720

Vowel synthesis using feed-forward neural networks

Similar Items