Human Translation Quality Estimation: Feature-based and Deep Learning-based

Abstract

This thesis studies the technical and linguistic aspects of human translation quality
estimation (HTQE) for trainee translations from English to Chinese. To this end, it
is cast as a supervised machine learning task through conventional feature-based
learning and deep learning to predict fine-grained translation quality scores through
regression, using no reference translations.
I investigated how human translations (HTs) can be effectively represented at
both the document-level and the sentence-level for quality estimation, exploiting
feature-based and deep learning-based methods. Specifically, an extensive frame-
work of translation quality features has been designed at both the sentence- and
document-level, and a novel stacked neural model with a cross-lingual attention
mechanism, leveraging the strengths of convolutional neural networks and recurrent
neural networks, also has been proposed.
From the feature-based perspective, a supervised classification method is
proposed to identify terminology for quality evaluation purpose, using language-
independent statistics as features. I investigated the correlation of normalised term
occurrences with human annotated quality scores. Descriptive and exploratory statis-
tics are carried out on trainee and machine translation datasets through pairwise
correlation and principal component analysis to study the contribution of individual
and group features and the distribution of translation errors, having shown that
HT errors cause mainly content inadequacy and machine translation (MT) errors
are more about language misuse. Fine-grained document-level and sentence-level
HTQE models are trained using the state-of-the-art XGBoost algorithm with grid
search parameter optimisation. Multiple models built with different feature selection
strategies are compared to a strong baseline QuEst for machine translation quality
estimation. On HT and MT data, the optimal models outperform the baseline and
other models in predicting the majority of quality scores on the criterion of the
agreement with human judgements. From the deep learning-based perspective, a
stacked neural model specifically for sentence-level HTQE is presented. The neural
architecture has achieved good correlations with human judgements for HTs. For
the prediction of MT post-editing efforts, it has achieved comparable performance to
a strong baseline for predicting HTER scores of German-English MTs and English-
German machine translations (MTs) on the WMT17 test data. The model has also
produced good results for predicting keystrokes.

I conclude that this work has created a framework for document-level and
sentence-level HTQE and has possibly started a new direction for human translation
quality assessment in Translation Studies. The results on HT data show promising
performance of the proposed HTQE methods in predicting fine-grained translation
quality from multiple aspects, sheddin

Metadata

Supervisors:	Sharoff, Serge and Babych, Bogdan
Keywords:	human translation quality estimation machine translation quality estimation machine learning feature engineering deep learning neural networks document-level quality sentence-level quality trainee translations
Awarding institution:	University of Leeds
Academic Units:	The University of Leeds > Faculty of Arts, Humanities and Cultures (Leeds) > School of Languages Cultures and Societies (Leeds)
Identification Number/EthosID:	uk.bl.ethos.800464
Depositing User:	Yu Yuan
Date Deposited:	19 Mar 2020 12:04
Last Modified:	11 May 2023 09:53
Open Archives Initiative ID (OAI ID):	oai:etheses.whiterose.ac.uk:26290

Download

Final eThesis - complete (pdf)

Filename: Yuan_Y_SLC_PhD_2018.pdf

Licence:
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 2.5 License

CLICK TO DOWNLOAD

You do not need to contact us to get a copy of this thesis. Please use the 'Download' link(s) above to get a copy.
You can contact us about this thesis. If you need to make a general enquiry, please see the Contact us page.

Human Translation Quality Estimation: Feature-based and Deep Learning-based

Abstract

Metadata

Download

Final eThesis - complete (pdf)

Export

Statistics