White Rose University Consortium logo
University of Leeds logo University of Sheffield logo York University logo

Articulatory-Based English Consonant Synthesis in 2-D Digital Waveguide Mesh

Rugchatjaroen, Anocha (2014) Articulatory-Based English Consonant Synthesis in 2-D Digital Waveguide Mesh. PhD thesis, University of York.

Available under License Creative Commons Attribution-Noncommercial-No Derivative Works 2.0 UK: England & Wales.

Download (5Mb) | Preview


In articulatory speech synthesis, the 3-D shape of a vocal tract for a particular speech sound has typically been established, for example, by magnetic resonance imaging (MRI), and this is used to model the acoustic output from the tract using numerical methods that operate in either one, two or three dimensions. The dimensionality strongly affects the overall computation complexity, which has a direct bearing on the quality of the synthesized speech output. The digital waveguide mesh (DWM) is a numerical method commonly used in room acoustic modelling. A smaller space such as a vocal tract, which is about 5 cm wide and 16.5-18 cm long in adults, can also be modelled using DWM in one, two and three dimensions. The latter requires a very dense mesh requiring massive computational resources; these requirements are lessened by using a lower dimensionality (two rather than three) and/or a less dense mesh. The computational cost of 2-D digital waveguide modelling makes it a practical technique for real-time synthesis in an average PC at full (20 kHz) audio bandwidth. This research makes use of a 2-D mesh with the advantage of the availability and flexibility of existing boundary modelling and the raised-cosine impedance control to study the possibilities of using it for English consonant synthesis. The research was organized under the phonetic ‘manner’ classification of English consonants as: semi-vowel, nasal, fricative, plosive and affricate. Their production has been studied in terms of acoustic pressure wave propagation. Meshing topology was fixed to being a 4-port scattering 2-D rectilinear waveguide mesh for ease of understanding and mapping to the tract shape. As the characteristic of consonant production requires vocal tract articulation variations that are quite unlike vowels, this research adopts the articulatory trajectories using electromagnetic (mid-sagittal) articulograph (EMA) data from mngu0 to guide the change of cross-sectional vocal tract area. Generally, articulatory trajectories have been used to improve the accuracy of speech recognition and synthesis in recent decades. This research adopts the 3 trajectories to control coarticulation in consonant synthesis to demonstrate that a 2-D digital waveguide mesh (DWM) is able to simulate the formant transition accurately. The formant transitions in the results are close acoustically to natural speech and are based on controlling articulation for four places of articulation. Positions of lip, tongue tip, tongue body and tongue dorsum are inversely mapped to their corresponding cross-sectional areas. Linear interpolation between them enabled all tract movements to be modelled. The results show that tract movements are best modelled as non-linear coarticulation.

Item Type: Thesis (PhD)
Keywords: Articulatory-based speech synthesis, English consonant synthesis, speech synthesis, Digital waveguide mesh
Academic Units: The University of York > Electronics (York)
Identification Number/EthosID: uk.bl.ethos.628592
Depositing User: Miss Anocha Rugchatjaroen
Date Deposited: 23 Oct 2014 14:55
Last Modified: 08 Sep 2016 13:31
URI: http://etheses.whiterose.ac.uk/id/eprint/7109

You do not need to contact us to get a copy of this thesis. Please use the 'Download' link(s) above to get a copy.
You can contact us about this thesis. If you need to make a general enquiry, please see the Contact us page.

Actions (repository staff only: login required)