Sehgal, Siddharth (2018) Dysarthric speech analysis and automatic recognition using phase based representations. PhD thesis, University of Sheffield.
Abstract
Dysarthria is a neurological speech impairment which usually results in the loss of motor speech control due to muscular atrophy and poor coordination of articulators. Dysarthric speech is more difficult to model with machine learning algorithms, due to inconsistencies in the acoustic signal and to limited amounts of training data. This study reports a new approach for the analysis and representation of dysarthric speech, and applies it to improve ASR performance.
The Zeros of Z-Transform (ZZT) are investigated for dysarthric vowel segments. It shows evidence of a phase-based acoustic phenomenon that is responsible for the way the distribution of zero patterns relate to speech intelligibility. It is investigated whether such phase-based artefacts can be systematically exploited to understand their association with intelligibility.
A metric based on the phase slope deviation (PSD) is introduced that are observed in the unwrapped phase spectrum of dysarthric vowel segments. The metric compares the differences between the slopes of dysarthric vowels and typical vowels. The PSD shows a strong and nearly linear correspondence with the intelligibility of the speaker, and it is shown to hold for two separate databases of dysarthric speakers. A systematic procedure for correcting the underlying phase deviations results in a significant improvement in ASR performance for speakers with severe and moderate dysarthria.
In addition, information encoded in the phase component of the Fourier transform of dysarthric speech is exploited in the group delay spectrum. Its properties are found to represent disordered speech more effectively than the magnitude spectrum. Dysarthric ASR performance was significantly improved using phase-based cepstral features in comparison to the conventional MFCCs. A combined approach utilising the benefits of PSD corrections and phase-based features was found to surpass all the previous performance on the UASPEECH database of dysarthric speech.
Metadata
Supervisors: | Cunningham, Stuart and Moore, Roger |
---|---|
Awarding institution: | University of Sheffield |
Academic Units: | The University of Sheffield > Faculty of Social Sciences (Sheffield) > Human Communication Sciences (Sheffield) The University of Sheffield > Faculty of Medicine, Dentistry and Health (Sheffield) > Human Communication Sciences (Sheffield) |
Identification Number/EthosID: | uk.bl.ethos.758384 |
Depositing User: | Mr Siddharth Sehgal |
Date Deposited: | 12 Nov 2018 09:58 |
Last Modified: | 25 Mar 2021 16:51 |
Open Archives Initiative ID (OAI ID): | oai:etheses.whiterose.ac.uk:22083 |
Download
Portable Document Format
Filename: sid_thesis.pdf
Description: Portable Document Format
Licence:
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivs 2.5 License
Export
Statistics
You do not need to contact us to get a copy of this thesis. Please use the 'Download' link(s) above to get a copy.
You can contact us about this thesis. If you need to make a general enquiry, please see the Contact us page.