White Rose University Consortium logo
University of Leeds logo University of Sheffield logo York University logo

Diphthong Synthesis using the Three-Dimensional Dynamic Digital Waveguide Mesh

Gully, Amelia J (2017) Diphthong Synthesis using the Three-Dimensional Dynamic Digital Waveguide Mesh. PhD thesis, University of York.

This is the latest version of this item.

thesis_final.pdf - Examined Thesis (PDF)
Available under License Creative Commons Attribution-Noncommercial-No Derivative Works 2.0 UK: England & Wales.

Download (8Mb) | Preview


The human voice is a complex and nuanced instrument, and despite many years of research, no system is yet capable of producing natural-sounding synthetic speech. This affects intelligibility for some groups of listeners, in applications such as automated announcements and screen readers. Furthermore, those who require a computer to speak - due to surgery or a degenerative disease - are limited to unnatural-sounding voices that lack expressive control and may not match the user's gender, age or accent. It is evident that natural, personalised and controllable synthetic speech systems are required. A three-dimensional digital waveguide model of the vocal tract, based on magnetic resonance imaging data, is proposed here in order to address these issues. The model uses a heterogeneous digital waveguide mesh method to represent the vocal tract airway and surrounding tissues, facilitating dynamic movement and hence speech output. The accuracy of the method is validated by comparison with audio recordings of natural speech, and perceptual tests are performed which confirm that the proposed model sounds significantly more natural than simpler digital waveguide mesh vocal tract models. Control of such a model is also considered, and a proof-of-concept study is presented using a deep neural network to control the parameters of a two-dimensional vocal tract model, resulting in intelligible speech output and paving the way for extension of the control system to the proposed three-dimensional vocal tract model. Future improvements to the system are also discussed in detail. This project considers both the naturalness and control issues associated with synthetic speech and therefore represents a significant step towards improved synthetic speech for use across society.

Item Type: Thesis (PhD)
Academic Units: The University of York > Electronics (York)
Depositing User: Miss Amelia J Gully
Date Deposited: 11 Jun 2018 09:09
Last Modified: 11 Apr 2019 00:18
URI: http://etheses.whiterose.ac.uk/id/eprint/20043

Available Versions of this Item

  • Diphthong Synthesis using the Three-Dimensional Dynamic Digital Waveguide Mesh. (deposited 11 Jun 2018 09:09) [Currently Displayed]

You do not need to contact us to get a copy of this thesis. Please use the 'Download' link(s) above to get a copy.
You can contact us about this thesis. If you need to make a general enquiry, please see the Contact us page.

Actions (repository staff only: login required)