The Use of Optimal Cue Mapping to Improve the Intelligibility and Quality of Speech in Complex Binaural Sound Mixtures.

Abstract

A person with normal hearing has the ability to follow a particular conversation of interest in a noisy and reverberant environment, whilst simultaneously ignoring the interfering sounds. This task often becomes more challenging for individuals with a hearing impairment. Attending selectively to a sound source is difficult to replicate in machines, including devices such as hearing aids. A correctly set up hearing aid will work well in quiet conditions, but its performance may deteriorate seriously in the presence of competing sounds. To be of help in these more challenging situations the hearing aid should be able to segregate the desired sound source from any other, unwanted sounds.

This thesis explores a novel approach to speech segregation based on optimal cue mapping (OCM). OCM is a signal processing method for segregating a sound source based on spatial and other cues extracted from the binaural mixture of sounds arriving at a listener's ears. The spectral energy fraction of the target speech source in the mixture is estimated frame-by-frame using artificial neural networks (ANNs). The resulting target speech magnitude estimates for the left and right channels are combined with the corresponding original phase spectra to produce the final binaural output signal. The performance improvements delivered by the OCM algorithm are evaluated using the STOI and PESQ metrics for speech intelligibility and quality, respectively. A variety of increasingly challenging binaural mixtures are synthesised involving up to five spatially separate sound sources in both anechoic and reverberant environments. The segregated speech consistently exhibits gains in intelligibility and quality and compares favourably with a leading, somewhat more complex approach. The OCM method allows the selection and integration of multiple cues to be optimised and provides scalable performance benefits to suit the available computational resources. The ability to determine the varying relative importance of each cue in different acoustic conditions is expected to facilitate computationally efficient solutions suitable for use in a hearing aid, allowing the aid to operate effectively in a range of typical acoustic environments. Further developments are proposed to achieve this overall goal.

Metadata

Supervisors:	Tew, Tony
Keywords:	Speech segregation, Neural networks, Ratio mask
Awarding institution:	University of York
Academic Units:	The University of York > School of Physics, Engineering and Technology (York)
Academic unit:	Electronics
Identification Number/EthosID:	uk.bl.ethos.686542
Depositing User:	Mr Jingbo Gao
Date Deposited:	24 May 2016 09:48
Last Modified:	21 Mar 2024 14:48
Open Archives Initiative ID (OAI ID):	oai:etheses.whiterose.ac.uk:13174

Download

PDF

Filename: thesisFinal170116.pdf

Description: PDF

Licence:
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivs 2.5 License

CLICK TO DOWNLOAD

You do not need to contact us to get a copy of this thesis. Please use the 'Download' link(s) above to get a copy.
You can contact us about this thesis. If you need to make a general enquiry, please see the Contact us page.

The Use of Optimal Cue Mapping to Improve the Intelligibility and Quality of Speech in Complex Binaural Sound Mixtures.

Abstract

Metadata

Download

PDF

Export

Statistics