Yue, Zhengjun ORCID: https://orcid.org/0000-0002-1101-549X (2022) Continuous Speech Recognition for People with Dysarthria. PhD thesis, University of Sheffield.
Abstract
Dysarthria is a motor speech disorder caused by damage to the nervous system. People with dysarthria often have poorer motor control of their speech articulators resulting in atypical speech. Consequently, the intelligibility of dysarthric speech is often affected and this could affect the communication ability of people with dysarthria, which may cause social exclusion. Dysarthria is also often associated with physical disabilities. This group of people, therefore, have a higher need for automation and voice-enabled interfaces that could improve their daily life. Automatic speech recognition (ASR) technology is becoming ever more ubiquitous. However, the performance on dysarthric speech still lags far behind the mainstream ASR systems designed for typical speech. The large systematic dysarthric and typical speech mismatch, the high speaker variability and the data scarcity make the task challenging. Moreover, the focus on dysarthric speech recognition research has not moved from isolated word to more challenging connected speech scenarios yet. There is a clear need to improve continuous dysarthric speech recognition.
This thesis is the first to systematically investigate various methods for continuous dysarthric speech recognition. Experimental work conducted on the TORGO dysarthric corpus shows that the deployed approaches and the developed systems effectively improve the recognition performance.
The key findings are as follows. Firstly, applying an out-of-domain language model allows for a more reasonable decoding space for continuous dysarthric speech, leading to fairer performance. Secondly, incorporating features extracted from an autoencoder-bottleneck feature extractor which is jointly optimised with a speech recogniser is shown to effectively lead to better recognition performance. Employing the monophone regularisation as an auxiliary task can further benefit the performance. Thirdly, by incorporating real articulatory information alongside acoustic features, a multi-modal acoustic-articulatory system is demonstrated to achieve encouraging performance. The best feature fusion scheme is explored and shown to achieve better results. In conclusion, this thesis makes promising progress in improving continuous dysarthric speech recognition.
Metadata
Supervisors: | Christensen, Heidi and Barker, Jon |
---|---|
Keywords: | Dysarthric speech, speech recognition, multi-modal acoustic modelling, continuous speech |
Awarding institution: | University of Sheffield |
Academic Units: | The University of Sheffield > Faculty of Engineering (Sheffield) > Computer Science (Sheffield) The University of Sheffield > Faculty of Science (Sheffield) > Computer Science (Sheffield) |
Depositing User: | Dr Zhengjun Yue |
Date Deposited: | 03 Jan 2023 14:49 |
Last Modified: | 03 Jan 2024 01:05 |
Open Archives Initiative ID (OAI ID): | oai:etheses.whiterose.ac.uk:32041 |
Download
Final eThesis - complete (pdf)
Filename: PhD_Thesis_ZJ__final_.pdf
Licence:
This work is licensed under a Creative Commons Attribution NonCommercial NoDerivatives 4.0 International License
Export
Statistics
You do not need to contact us to get a copy of this thesis. Please use the 'Download' link(s) above to get a copy.
You can contact us about this thesis. If you need to make a general enquiry, please see the Contact us page.