White Rose University Consortium logo
University of Leeds logo University of Sheffield logo York University logo

Human Translation Quality Estimation: Feature-based and Deep Learning-based

Yuan, Yu (2018) Human Translation Quality Estimation: Feature-based and Deep Learning-based. PhD thesis, University of Leeds.

[img] Text
Yuan_Y_SLC_PhD_2018.pdf - Final eThesis - complete (pdf)
Restricted until 1 April 2023.

Request a copy


This thesis studies the technical and linguistic aspects of human translation quality estimation (HTQE) for trainee translations from English to Chinese. To this end, it is cast as a supervised machine learning task through conventional feature-based learning and deep learning to predict fine-grained translation quality scores through regression, using no reference translations. I investigated how human translations (HTs) can be effectively represented at both the document-level and the sentence-level for quality estimation, exploiting feature-based and deep learning-based methods. Specifically, an extensive frame- work of translation quality features has been designed at both the sentence- and document-level, and a novel stacked neural model with a cross-lingual attention mechanism, leveraging the strengths of convolutional neural networks and recurrent neural networks, also has been proposed. From the feature-based perspective, a supervised classification method is proposed to identify terminology for quality evaluation purpose, using language- independent statistics as features. I investigated the correlation of normalised term occurrences with human annotated quality scores. Descriptive and exploratory statis- tics are carried out on trainee and machine translation datasets through pairwise correlation and principal component analysis to study the contribution of individual and group features and the distribution of translation errors, having shown that HT errors cause mainly content inadequacy and machine translation (MT) errors are more about language misuse. Fine-grained document-level and sentence-level HTQE models are trained using the state-of-the-art XGBoost algorithm with grid search parameter optimisation. Multiple models built with different feature selection strategies are compared to a strong baseline QuEst for machine translation quality estimation. On HT and MT data, the optimal models outperform the baseline and other models in predicting the majority of quality scores on the criterion of the agreement with human judgements. From the deep learning-based perspective, a stacked neural model specifically for sentence-level HTQE is presented. The neural architecture has achieved good correlations with human judgements for HTs. For the prediction of MT post-editing efforts, it has achieved comparable performance to a strong baseline for predicting HTER scores of German-English MTs and English- German machine translations (MTs) on the WMT17 test data. The model has also produced good results for predicting keystrokes. I conclude that this work has created a framework for document-level and sentence-level HTQE and has possibly started a new direction for human translation quality assessment in Translation Studies. The results on HT data show promising performance of the proposed HTQE methods in predicting fine-grained translation quality from multiple aspects, sheddin

Item Type: Thesis (PhD)
Keywords: human translation quality estimation machine translation quality estimation machine learning feature engineering deep learning neural networks document-level quality sentence-level quality trainee translations
Academic Units: The University of Leeds > Faculty of Arts, Humanities and Cultures (Leeds) > School of Languages Cultures and Societies (Leeds)
Depositing User: Yu Yuan
Date Deposited: 19 Mar 2020 12:04
Last Modified: 19 Mar 2020 12:04
URI: http://etheses.whiterose.ac.uk/id/eprint/26290

Please use the 'Request a copy' link(s) above to request this thesis. This will be sent directly to someone who may authorise access.
You can contact us about this thesis. If you need to make a general enquiry, please see the Contact us page.

Actions (repository staff only: login required)