Diphthong Synthesis using the Three-Dimensional Dynamic Digital Waveguide Mesh

Abstract

The human voice is a complex and nuanced instrument, and despite many years of research, no system is yet capable of producing natural-sounding synthetic speech. This affects intelligibility for some groups of listeners, in applications such as automated announcements and screen readers. Furthermore, those who require a computer to speak - due to surgery or a degenerative disease - are limited to unnatural-sounding voices that lack expressive control and may not match the user's gender, age or accent. It is evident that natural, personalised and controllable synthetic speech systems are required. A three-dimensional digital waveguide model of the vocal tract, based on magnetic resonance imaging data, is proposed here in order to address these issues. The model uses a heterogeneous digital waveguide mesh method to represent the vocal tract airway and surrounding tissues, facilitating dynamic movement and hence speech output. The accuracy of the method is validated by comparison with audio recordings of natural speech, and perceptual tests are performed which confirm that the proposed model sounds significantly more natural than simpler digital waveguide mesh vocal tract models. Control of such a model is also considered, and a proof-of-concept study is presented using a deep neural network to control the parameters of a two-dimensional vocal tract model, resulting in intelligible speech output and paving the way for extension of the control system to the proposed three-dimensional vocal tract model. Future improvements to the system are also discussed in detail. This project considers both the naturalness and control issues associated with synthetic speech and therefore represents a significant step towards improved synthetic speech for use across society.

Metadata

Supervisors:	Murphy, Damian T
Awarding institution:	University of York
Academic Units:	The University of York > School of Physics, Engineering and Technology (York)
Academic unit:	Electronic Engineering
Identification Number/EthosID:	uk.bl.ethos.745720
Depositing User:	Miss Amelia J Gully
Date Deposited:	11 Jun 2018 09:09
Last Modified:	21 Mar 2024 15:08
Open Archives Initiative ID (OAI ID):	oai:etheses.whiterose.ac.uk:20043

Download

Examined Thesis (PDF)

Filename: thesis_final.pdf

Licence:
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivs 2.5 License

CLICK TO DOWNLOAD

You do not need to contact us to get a copy of this thesis. Please use the 'Download' link(s) above to get a copy.
You can contact us about this thesis. If you need to make a general enquiry, please see the Contact us page.

Diphthong Synthesis using the Three-Dimensional Dynamic Digital Waveguide Mesh

Abstract

Metadata

Download

Examined Thesis (PDF)

Export

Statistics