Vincent, Sebastian ORCID: https://orcid.org/0000-0001-8975-165X (2023) Context-Based Personalisation in Neural Machine Translation of Dialogue. PhD thesis, University of Sheffield.
Abstract
Neural machine translation (NMT) has revolutionised automatic translation and has been instrumental in saving costs and improvements in productivity within the translation industry. However, contemporary NMT systems are still primarily designed to translate isolated sentences, disregarding crucial contextual information in the process. This lack of context awareness frequently leads to assumptions about the most likely interpretation of the source text, potentially propagating harmful biases learned from the training data, such as assuming that the average participant in a conversation is male. In the dialogue domain, where the meaning of an utterance may vary depending on what was said before, the environment, the individuals involved, their relationship, and more, translations produced by context-agnostic systems often fall short in capturing the nuances of specific characters or situations.
This thesis expands the understanding of and explores the potential applications of contextual NMT with focus on personalisation. Our methods challenge the prevailing context-agnostic strategy in machine translation and seek to address the aforementioned issues. Our research suggests that by integrating existing information into the translation process we can enhance the quality of translation hypotheses. Additionally, we demonstrate that one type of information can be effectively leveraged to enable manipulation of another. Our experiments involve adapting machine translation systems to individual speakers and productions, focusing on combinations of their individual characteristics rather than relying on discrete labels. We also explore personalisation of language models based on context information expressed in this way: to personalise a model for a particular character, we use a combination of their traits. These personalised language models are then used in an evaluation scenario where the context specificity of machine translation hypotheses is expressed as the pointwise mutual information between the proposed text and its original context. Finally, our best personalised NMT system is thoroughly evaluated in a professional multi-modal setting of translating subtitles for TV series on two language pairs: English-to-German and English-to-French. Throughout the thesis, we report on experiments with various types of context in a setting of translation between English and a range of European languages. Our chosen domain is dialogue extracted from TV series and films, due to the availability of context-rich datasets, as well as the potential practical application of this research to the work of the industrial partner to this PhD, ZOO Digital. Our research tackles five primary challenges: (i) direct incorporation of extra-textual information into neural machine translation systems, (ii) zero-shot and few-shot control of this information, (iii) reference-free evaluation and analysis of contextual NMT, (iv) personalisation of language models (LMs) and NMT systems using rich sets of speaker and film metadata annotations, and (v) human evaluation of machine translation in a professional post-editing setting. By addressing these challenges, this thesis aims to enhance machine translation in dialogue by ensuring translations are better suited to the specific characters, addressees, and contextual factors involved. The research contributes to the advancement of NMT systems that can effectively account for the personalised nature of dialogue.
Metadata
Supervisors: | Scarton, Carolina |
---|---|
Related URLs: | |
Keywords: | context, neural machine translation, natural language processing, personalisation, personalization, metadata, subtitles, dialogue, scripted dialogue, tv series, context-aware machine translation, extra-textual information |
Awarding institution: | University of Sheffield |
Academic Units: | The University of Sheffield > Faculty of Engineering (Sheffield) > Computer Science (Sheffield) The University of Sheffield > Faculty of Science (Sheffield) > Computer Science (Sheffield) |
Depositing User: | Mr Sebastian Vincent |
Date Deposited: | 12 Jan 2024 15:25 |
Last Modified: | 12 Jan 2024 15:25 |
Open Archives Initiative ID (OAI ID): | oai:etheses.whiterose.ac.uk:34022 |
Download
Final eThesis - complete (pdf)
Filename: PhD_Thesis_ST_Vincent.pdf
Licence:
This work is licensed under a Creative Commons Attribution NonCommercial NoDerivatives 4.0 International License
Export
Statistics
You do not need to contact us to get a copy of this thesis. Please use the 'Download' link(s) above to get a copy.
You can contact us about this thesis. If you need to make a general enquiry, please see the Contact us page.