Brown, Guy Jason (1992) Computational auditory scene analysis : a representational approach. PhD thesis, University of Sheffield.
Abstract
This thesis addresses the problem of how a listener groups together acoustic components which have arisen from the same environmental event, a phenomenon known
as auditory scene analysis. A computational model of auditory scene analysis is
presented, which is able to separate speech from a variety of interfering noises.
The model consists of four processing stages. Firstly, the auditory periphery is
simulated by a bank of bandpass filters and a model of inner hair cell function. In
the second stage, physiologically-inspired models of higher auditory organization -
aiditory maps - are used to provide a rich representational basis for scene analysis.
Periodicities in the acoustic input are coded by an ant ocorrelation map and a crosscorrelation map. Information about spectral continuity is extracted by a frequency
transition map. The times at which acoustic components start and stop are identified
by an onset map and an offset map.
In the third 8tage of processing, information from the periodicity and frequency
transition maps is used to characterize the auditory scene as a collection of symbolic auditory objects. Finally, a search strategy identifies objects that have similar
properties and groups them together. Specifically, objects are likely to form a group
if they have a similar periodicity, onset time or offset time.
The model has been evaluated in two ways, using the task of segregating voiced
speech from a number of interfering sounds such as random noise, "cocktail party"
noise and other speech. Firstly, a waveform can be resynthesized for each group
in the auditory scene, so that segregation performance can be assessed by informal
listening tests. The resynthesized speech is highly intelligible and fairly natural.
Secondly, the linear nature of the resynthesis process allows the signal-to-noise ratio
(SNR) to be compared before and after segregation. An improvement in SNR is
obtained after segregation for each type of interfering noise. Additionally, the performance of the model is significantly better than that of a conventional frame-based
autocorrelation segregation strategy.
Metadata
Keywords: | Biophysics |
---|---|
Awarding institution: | University of Sheffield |
Academic Units: | The University of Sheffield > Faculty of Engineering (Sheffield) > Computer Science (Sheffield) The University of Sheffield > Faculty of Science (Sheffield) > Computer Science (Sheffield) |
Identification Number/EthosID: | uk.bl.ethos.284554 |
Depositing User: | EThOS Import Sheffield |
Date Deposited: | 21 Nov 2012 16:46 |
Last Modified: | 08 Aug 2013 08:50 |
Open Archives Initiative ID (OAI ID): | oai:etheses.whiterose.ac.uk:2982 |
You do not need to contact us to get a copy of this thesis. Please use the 'Download' link(s) above to get a copy.
You can contact us about this thesis. If you need to make a general enquiry, please see the Contact us page.