Articulatory-Based English Consonant Synthesis in 2-D Digital Waveguide Mesh

Abstract

In articulatory speech synthesis, the 3-D shape of a vocal tract for a particular speech sound has typically been established, for example, by magnetic resonance imaging (MRI), and this is used to model the acoustic output from the tract using numerical methods that operate in either one, two or three dimensions. The dimensionality strongly affects the overall computation complexity, which has a direct bearing on the quality of the synthesized speech output.
The digital waveguide mesh (DWM) is a numerical method commonly used in room acoustic modelling. A smaller space such as a vocal tract, which is about 5 cm wide and 16.5-18 cm long in adults, can also be modelled using DWM in one, two and three dimensions. The latter requires a very dense mesh requiring massive computational resources; these requirements are lessened by using a lower dimensionality (two rather than three) and/or a less dense mesh. The computational cost of 2-D digital waveguide modelling makes it a practical technique for real-time synthesis in an average PC at full (20 kHz) audio bandwidth. This research makes use of a 2-D mesh with the advantage of the availability and flexibility of existing boundary modelling and the raised-cosine impedance control to study the possibilities of using it for English consonant synthesis.
The research was organized under the phonetic ‘manner’ classification of English consonants as: semi-vowel, nasal, fricative, plosive and affricate. Their production has been studied in terms of acoustic pressure wave propagation. Meshing topology was fixed to being a 4-port scattering 2-D rectilinear waveguide mesh for ease of understanding and mapping to the tract shape.
As the characteristic of consonant production requires vocal tract articulation variations that are quite unlike vowels, this research adopts the articulatory trajectories using electromagnetic (mid-sagittal) articulograph (EMA) data from mngu0 to guide the change of cross-sectional vocal tract area. Generally, articulatory trajectories have been used to improve the accuracy of speech recognition and synthesis in recent decades. This research adopts the
3
trajectories to control coarticulation in consonant synthesis to demonstrate that a 2-D digital waveguide mesh (DWM) is able to simulate the formant transition accurately. The formant transitions in the results are close acoustically to natural speech and are based on controlling articulation for four places of articulation. Positions of lip, tongue tip, tongue body and tongue dorsum are inversely mapped to their corresponding cross-sectional areas. Linear interpolation between them enabled all tract movements to be modelled. The results show that tract movements are best modelled as non-linear coarticulation.

Metadata

Supervisors:	Howard, David
Keywords:	Articulatory-based speech synthesis, English consonant synthesis, speech synthesis, Digital waveguide mesh
Awarding institution:	University of York
Academic Units:	The University of York > School of Physics, Engineering and Technology (York)
Academic unit:	Electronics
Identification Number/EthosID:	uk.bl.ethos.628592
Depositing User:	Miss Anocha Rugchatjaroen
Date Deposited:	23 Oct 2014 14:55
Last Modified:	21 Mar 2024 14:40
Open Archives Initiative ID (OAI ID):	oai:etheses.whiterose.ac.uk:7109

Download

AnochaRugchatjaroen_FullThesis_Corrected

Filename: AnochaRugchatjaroen_FullThesis_Corrected.pdf

Licence:
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivs 2.5 License

CLICK TO DOWNLOAD

[thumbnail of AnochaRugchatjaroen_FullThesis_Corrected.pdf]

You do not need to contact us to get a copy of this thesis. Please use the 'Download' link(s) above to get a copy.
You can contact us about this thesis. If you need to make a general enquiry, please see the Contact us page.

Articulatory-Based English Consonant Synthesis in 2-D Digital Waveguide Mesh

Abstract

Metadata

Download

AnochaRugchatjaroen_FullThesis_Corrected

Export

Statistics