White Rose University Consortium logo
University of Leeds logo University of Sheffield logo York University logo

The Use of Optimal Cue Mapping to Improve the Intelligibility and Quality of Speech in Complex Binaural Sound Mixtures.

Gao, Jingbo (2016) The Use of Optimal Cue Mapping to Improve the Intelligibility and Quality of Speech in Complex Binaural Sound Mixtures. PhD thesis, University of York.

Text (PDF)
Available under License Creative Commons Attribution-Noncommercial-No Derivative Works 2.0 UK: England & Wales.

Download (9Mb) | Preview


A person with normal hearing has the ability to follow a particular conversation of interest in a noisy and reverberant environment, whilst simultaneously ignoring the interfering sounds. This task often becomes more challenging for individuals with a hearing impairment. Attending selectively to a sound source is difficult to replicate in machines, including devices such as hearing aids. A correctly set up hearing aid will work well in quiet conditions, but its performance may deteriorate seriously in the presence of competing sounds. To be of help in these more challenging situations the hearing aid should be able to segregate the desired sound source from any other, unwanted sounds. This thesis explores a novel approach to speech segregation based on optimal cue mapping (OCM). OCM is a signal processing method for segregating a sound source based on spatial and other cues extracted from the binaural mixture of sounds arriving at a listener's ears. The spectral energy fraction of the target speech source in the mixture is estimated frame-by-frame using artificial neural networks (ANNs). The resulting target speech magnitude estimates for the left and right channels are combined with the corresponding original phase spectra to produce the final binaural output signal. The performance improvements delivered by the OCM algorithm are evaluated using the STOI and PESQ metrics for speech intelligibility and quality, respectively. A variety of increasingly challenging binaural mixtures are synthesised involving up to five spatially separate sound sources in both anechoic and reverberant environments. The segregated speech consistently exhibits gains in intelligibility and quality and compares favourably with a leading, somewhat more complex approach. The OCM method allows the selection and integration of multiple cues to be optimised and provides scalable performance benefits to suit the available computational resources. The ability to determine the varying relative importance of each cue in different acoustic conditions is expected to facilitate computationally efficient solutions suitable for use in a hearing aid, allowing the aid to operate effectively in a range of typical acoustic environments. Further developments are proposed to achieve this overall goal.

Item Type: Thesis (PhD)
Keywords: Speech segregation, Neural networks, Ratio mask
Academic Units: The University of York > Electronics (York)
Identification Number/EthosID: uk.bl.ethos.686542
Depositing User: Mr Jingbo Gao
Date Deposited: 24 May 2016 09:48
Last Modified: 24 Jul 2018 15:21
URI: http://etheses.whiterose.ac.uk/id/eprint/13174

You do not need to contact us to get a copy of this thesis. Please use the 'Download' link(s) above to get a copy.
You can contact us about this thesis. If you need to make a general enquiry, please see the Contact us page.

Actions (repository staff only: login required)