White Rose University Consortium logo
University of Leeds logo University of Sheffield logo York University logo

Conversational Arabic Automatic Speech Recognition

Al-Shareef, Sarah (2015) Conversational Arabic Automatic Speech Recognition. PhD thesis, University of Sheffield.

[img]
Preview
Text (PhD Thesis)
SAlshareef_PhDThesis2015.pdf
Available under License Creative Commons Attribution-Noncommercial-No Derivative Works 2.0 UK: England & Wales.

Download (53Mb) | Preview

Abstract

Colloquial Arabic (CA) is the set of spoken variants of modern Arabic that exist in the form of regional dialects and are considered generally to be mother-tongues in those regions. CA has limited textual resource because it exists only as a spoken language and without a standardised written form. Normally the modern standard Arabic (MSA) writing convention is employed that has limitations in phonetically representing CA. Without phonetic dictionaries the pronunciation of CA words is ambiguous, and can only be obtained through word and/or sentence context. Moreover, CA inherits the MSA complex word structure where words can be created from attaching affixes to a word. In automatic speech recognition (ASR), commonly used approaches to model acoustic, pronunciation and word variability are language independent. However, one can observe significant differences in performance between English and CA, with the latter yielding up to three times higher error rates. This thesis investigates the main issues for the under-performance of CA ASR systems. The work focuses on two directions: first, the impact of limited lexical coverage, and insufficient training data for written CA on language modelling is investigated; second, obtaining better models for the acoustics and pronunciations by learning to transfer between written and spoken forms. Several original contributions result from each direction. Using data-driven classes from decomposed text are shown to reduce out-of-vocabulary rate. A novel colloquialisation system to import additional data is introduced; automatic diacritisation to restore the missing short vowels was found to yield good performance; and a new acoustic set for describing CA was defined. Using the proposed methods improved the ASR performance in terms of word error rate in a CA conversational telephone speech ASR task.

Item Type: Thesis (PhD)
Keywords: colloquial Arabic, automatic speech recognition, acoustic modelling, pronunciation modelling, language modelling, dialectical Arabic speech, conversational Arabic, human language technology
Academic Units: The University of Sheffield > Faculty of Engineering (Sheffield) > Computer Science (Sheffield)
The University of Sheffield > Faculty of Science (Sheffield) > Computer Science (Sheffield)
Identification Number/EthosID: uk.bl.ethos.668288
Depositing User: Ms. Sarah Al-Shareef
Date Deposited: 19 Oct 2015 15:52
Last Modified: 03 Oct 2016 13:06
URI: http://etheses.whiterose.ac.uk/id/eprint/10145

Actions (repository staff only: login required)