Methods for Addressing Data Diversity in Automatic Speech Recognition

Abstract

The performance of speech recognition systems is known to degrade in mismatched conditions, where the acoustic environment and the speaker population significantly differ between the training and target test data. Performance degradation due to the mismatch is widely reported in the literature, particularly for diverse datasets.

This thesis approaches the mismatch problem in diverse datasets with various strategies including data refinement, variability modelling and speech recognition model adaptation. These strategies are realised in six novel contributions.

The first contribution is a data subset selection technique using likelihood ratio derived from a target test set quantifying mismatch. The second contribution is a multi-style training method using data augmentation.
The existing training data is augmented using a distribution of variabilities learnt from a target dataset, resulting in a matched set.

The third contribution is a new approach for genre identification in diverse media data with the aim of reducing the mismatch in an adaptation framework.

The fourth contribution is a novel method which performs an unsupervised domain discovery using latent Dirichlet allocation. Since the latent domains have a high correlation with some subjective meta-data tags, such as genre labels of media data, features derived from the latent domains are successfully applied to the genre and broadcast show identification tasks.

The fifth contribution extends the latent modelling technique for acoustic model adaptation, where latent-domain specific models are adapted from a base model. As the sixth contribution, an alternative adaptation approach is proposed where subspace adaptation of deep neural network acoustic models is performed using the proposed latent-domain aware training procedure.

All of the proposed techniques for mismatch reduction are verified using diverse datasets.
Using data selection, data augmentation and latent-domain model adaptation methods the mismatch between training and testing conditions of diverse ASR systems are reduced, resulting in more robust speech recognition systems.

Metadata

Supervisors:	Hain, Thomas
Awarding institution:	University of Sheffield
Academic Units:	The University of Sheffield > Faculty of Engineering (Sheffield) > Computer Science (Sheffield) The University of Sheffield > Faculty of Science (Sheffield) > Computer Science (Sheffield) The University of Sheffield > Faculty of Engineering (Sheffield)
Identification Number/EthosID:	uk.bl.ethos.713306
Depositing User:	Mr Mortaza Doulaty Bashkand
Date Deposited:	15 May 2017 07:54
Last Modified:	12 Oct 2018 09:39
Open Archives Initiative ID (OAI ID):	oai:etheses.whiterose.ac.uk:17096

Download

Mortaza Doulaty PhD Thesis

Filename: thesis_Mortaza_Doulaty_before_WhiteRose_submission.pdf

Description: Mortaza Doulaty PhD Thesis

Licence:
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivs 2.5 License

CLICK TO DOWNLOAD

[thumbnail of Mortaza Doulaty PhD Thesis]

You do not need to contact us to get a copy of this thesis. Please use the 'Download' link(s) above to get a copy.
You can contact us about this thesis. If you need to make a general enquiry, please see the Contact us page.

Methods for Addressing Data Diversity in Automatic Speech Recognition

Abstract

Metadata

Download

Mortaza Doulaty PhD Thesis

Export

Statistics