Close, George ORCID: https://orcid.org/0000-0002-9478-5421
(2025)
Perceptually Motivated Speech Enhancement.
PhD thesis, University of Sheffield.
Abstract
Speech Enhancement (SE) is a vital technology for online human communication. Applications of Deep Neural Network (DNN) technologies in concert with traditional signal processing approaches to the task have revolutionised both the research and implementation of SE in recent years. However, the training objective of these Neural Network Speech Enhancement (NNSE) systems generally do not consider the psychoacoustic processing which occurs in the human auditory system.
As a result, enhanced audio can often contain auditory artefacts which degrade the perceptual quality or intelligibility of the speech. To overcome this, systems which directly incorporate psychoacoustically motivated measures into the training objectives of NNSE systems have been proposed.
A key development in speech audio processing in recent years is the emergence of Self Supervised Speech Representation (SSSR) models. These are powerful foundational DNN models which can be utilised for a number of more specific speech processing tasks, such as speech recognition, emotion detection as well as SE. Finally, the methods of evaluation of SE systems have been revolutionised by DNN technology, that is to say the creation of systems which are able to directly predict Mean Option Score (MOS) ratings of Speech Quality (SQ) or Speech Intelligibility (SI) derived from human listening tests.
This thesis aims to investigate these three areas; psychoacoustic training objectives of NNSE, the incorporation of SSSR features and the prediction of human derived labels of speech directly from audio signals. Further, the intersection of these areas and combined use of techniques from these areas will be investigated.
Metadata
Supervisors: | Goetze, Stefan and Hain, Thomas |
---|---|
Keywords: | machine learning, artificial intelligence, speech enhancement, audio processing, speech processing, speech quality prediction |
Awarding institution: | University of Sheffield |
Academic Units: | The University of Sheffield > Faculty of Engineering (Sheffield) > Computer Science (Sheffield) |
Depositing User: | Dr George Close |
Date Deposited: | 07 Apr 2025 14:50 |
Last Modified: | 07 Apr 2025 14:50 |
Open Archives Initiative ID (OAI ID): | oai:etheses.whiterose.ac.uk:36577 |
Download
Final eThesis - complete (pdf)
Filename: _George__Thesis (5).pdf
Licence:
This work is licensed under a Creative Commons Attribution NonCommercial NoDerivatives 4.0 International License
Export
Statistics
You do not need to contact us to get a copy of this thesis. Please use the 'Download' link(s) above to get a copy.
You can contact us about this thesis. If you need to make a general enquiry, please see the Contact us page.