Lopez Saenz, Jose Antonio ORCID: https://orcid.org/0000-0002-8779-5947 (2023) A Model for the Assessor Bias in Second Language Pronunciation Assessment. PhD thesis, University of Sheffield.
Abstract
In pronunciation assessment (PA) of second language (L2) speech, it is known that similarity to a native accent is desired, yet not crucial. There are certain variations in pronunciation which do not interfere with communication. It is up to the listener to decide whether a pronunciation differs from the one of so-called canonical reference. The subjectivity in pronunciation assessment can be referred to as the assessor bias.
A computer-assisted pronunciation assessment is subject to the effects of assessor bias. The disagreement between assessors causes inconsistencies in the data used to build models for the assessment task. A model for the bias itself, however, would help build a general reference for a proficient L2 speaker as well as an impartial PA.
This thesis proposes a model for the assessor bias to be included as part of a model for a pronunciation assessor. The assessor model consists of an ideal assessor-independent scoring function for PA, modified by an additive term specific to the assessor. The latter term is referred to as bias. The research for the model resulted in four original contributions. All contributions were tested on data from L2 speech from young learners of English in the Netherlands. Each recording was annotated for mispronunciation at the phoneme level by three trained phoneticians. Overlapping annotation made the data the best fit for a consistent model of inter-assessor disagreement.
A first contribution is a novel approach for detecting mispronunciations without the need for a precise phoneme alignment, which outperformed a baseline of pronunciation correctness scores based on phoneme alignments. The second contribution is a study of the effect of speaker metadata on learning a pronunciation reference. Models trained on different assessors were proven to be sensitive to different speaker information. The third contribution was the proposal and implementation of the assessor model. Two deep networks combine a bidirectional long short-term memory module with self-attention and a feed-forward classifier to estimate the probabilities of phonemes being pronounced correctly. Both networks were trained jointly to estimate the observed pronunciation labels. Only one network was modelled on the assessor’s identity. The fourth contribution consists of methods for increasing the specialisation of the bias networks by reducing its cosine similarity and co-dependence with respect to the assessor-independent network. Using cosine similarity and a contrastive log-ratio upper bound for mutual information, it was possible to both reduce the correlation and dependency between the two networks. The bias network managed to increase its dependence on assessor identity and speaker factors. The mutual information between the assessor and the bias output was useful to illustrate disagreement, as well as which assessors and phonemes were the most prone to the bias.
Metadata
Supervisors: | Hain, Thomas |
---|---|
Keywords: | pronunciation assessment, L2 learning, GOP, phoneme recognition, speaker representation, perception bias, assessment bias |
Awarding institution: | University of Sheffield |
Academic Units: | The University of Sheffield > Faculty of Engineering (Sheffield) > Computer Science (Sheffield) The University of Sheffield > Faculty of Science (Sheffield) > Computer Science (Sheffield) The University of Sheffield > Faculty of Engineering (Sheffield) |
Identification Number/EthosID: | uk.bl.ethos.879590 |
Depositing User: | Mr Jose Antonio Lopez Saenz |
Date Deposited: | 09 May 2023 09:54 |
Last Modified: | 01 Jun 2023 09:53 |
Open Archives Initiative ID (OAI ID): | oai:etheses.whiterose.ac.uk:32637 |
Download
Final eThesis - complete (pdf)
Filename: thesis_april2023.pdf
Licence:
This work is licensed under a Creative Commons Attribution NonCommercial NoDerivatives 4.0 International License
Export
Statistics
You do not need to contact us to get a copy of this thesis. Please use the 'Download' link(s) above to get a copy.
You can contact us about this thesis. If you need to make a general enquiry, please see the Contact us page.