Human Translation Quality Estimation: Feature-based and Deep Learning-based

Abstract

This thesis studies the technical and linguistic aspects of human translation quality estimation (HTQE) for trainee translations from English to Chinese. To this end, it is cast as a supervised machine learning task through conventional feature-based learning and deep learning to predict fine-grained translation quality scores through regression, using no reference translations. I investigated how human translations (HTs) can be effectively represented at both the document-level and the sentence-level for quality estimation, exploiting feature-based and deep learning-based methods. Specifically, an extensive frame- work of translation quality features has been designed at both the sentence- and document-level, and a novel stacked neural model with a cross-lingual attention mechanism, leveraging the strengths of convolutional neural networks and recurrent neural networks, also has been proposed. From the feature-based perspective, a supervised classification method is proposed to identify terminology for quality evaluation purpose, using language- independent statistics as features. I investigated the correlation of normalised term occurrences with human annotated quality scores. Descriptive and exploratory statis- tics are carried out on trainee and machine translation datasets through pairwise correlation and principal component analysis to study the contribution of individual and group features and the distribution of translation errors, having shown that HT errors cause mainly content inadequacy and machine translation (MT) errors are more about language misuse. Fine-grained document-level and sentence-level HTQE models are trained using the state-of-the-art XGBoost algorithm with grid search parameter optimisation. Multiple models built with different feature selection strategies are compared to a strong baseline QuEst for machine translation quality estimation. On HT and MT data, the optimal models outperform the baseline and other models in predicting the majority of quality scores on the criterion of the agreement with human judgements. From the deep learning-based perspective, a stacked neural model specifically for sentence-level HTQE is presented. The neural architecture has achieved good correlations with human judgements for HTs. For the prediction of MT post-editing efforts, it has achieved comparable performance to a strong baseline for predicting HTER scores of German-English MTs and English- German machine translations (MTs) on the WMT17 test data. The model has also produced good results for predicting keystrokes. I conclude that this work has created a framework for document-level and sentence-level HTQE and has possibly started a new direction for human translation quality assessment in Translation Studies. The results on HT data show promising performance of the proposed HTQE methods in predicting fine-grained translation quality from multiple aspects, sheddin

Metadata

Supervisors:	Sharoff, Serge and Babych, Bogdan
Keywords:	human translation quality estimation machine translation quality estimation machine learning feature engineering deep learning neural networks document-level quality sentence-level quality trainee translations
Awarding institution:	University of Leeds
Academic Units:	The University of Leeds > Faculty of Arts, Humanities and Cultures (Leeds) > School of Languages Cultures and Societies (Leeds)
Identification Number/EthosID:	uk.bl.ethos.800464
Depositing User:	Yu Yuan
Date Deposited:	19 Mar 2020 12:04
Last Modified:	11 May 2023 09:53

Download

Final eThesis - complete (pdf)

Filename: Yuan_Y_SLC_PhD_2018.pdf

Licence:
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 2.5 License

CLICK TO DOWNLOAD

You do not need to contact us to get a copy of this thesis. Please use the 'Download' link(s) above to get a copy.
You can contact us about this thesis. If you need to make a general enquiry, please see the Contact us page.

CORE (COnnecting REpositories)

Human Translation Quality Estimation: Feature-based and Deep Learning-based

Abstract

Metadata

Download

Final eThesis - complete (pdf)

Export

Statistics