White Rose University Consortium logo
University of Leeds logo University of Sheffield logo York University logo

Coherence in Machine Translation

Sim Smith, Karin M (2018) Coherence in Machine Translation. PhD thesis, University of Sheffield.

[img]
Preview
Text
thesis_coherence_in_smt_final.pdf
Available under License Creative Commons Attribution-Noncommercial-No Derivative Works 2.0 UK: England & Wales.

Download (1071Kb) | Preview

Abstract

Coherence ensures individual sentences work together to form a meaningful document. When properly translated, a coherent document in one language should result in a coherent document in another language. In Machine Translation, however, due to reasons of modeling and computational complexity, sentences are pieced together from words or phrases based on short context windows and with no access to extra-sentential context. In this thesis I propose ways to automatically assess the coherence of machine translation output. The work is structured around three dimensions: entity-based coherence, coherence as evidenced via syntactic patterns, and coherence as evidenced via discourse relations. For the first time, I evaluate existing monolingual coherence models on this new task, identifying issues and challenges that are specific to the machine translation setting. In order to address these issues, I adapted a state-of-the-art syntax model, which also resulted in improved performance for the monolingual task. The results clearly indicate how much more difficult the new task is than the task of detecting shuffled texts. I proposed a new coherence model, exploring the crosslingual transfer of discourse relations in machine translation. This model is novel in that it measures the correctness of the discourse relation by comparison to the source text rather than to a reference translation. I identified patterns of incoherence common across different language pairs, and created a corpus of machine translated output annotated with coherence errors for evaluation purposes. I then examined lexical coherence in a multilingual context, as a preliminary study for crosslingual transfer. Finally, I determine how the new and adapted models correlate with human judgements of translation quality and suggest that improvements in general evaluation within machine translation would benefit from having a coherence component that evaluated the translation output with respect to the source text.

Item Type: Thesis (PhD)
Keywords: Machine Translation, Coherence, Discourse, reference-free evalation
Academic Units: The University of Sheffield > Faculty of Engineering (Sheffield) > Computer Science (Sheffield)
The University of Sheffield > Faculty of Science (Sheffield) > Computer Science (Sheffield)
Identification Number/EthosID: uk.bl.ethos.739893
Depositing User: Ms Karin M Sim Smith
Date Deposited: 08 May 2018 09:31
Last Modified: 12 Oct 2018 09:54
URI: http://etheses.whiterose.ac.uk/id/eprint/20083

You do not need to contact us to get a copy of this thesis. Please use the 'Download' link(s) above to get a copy.
You can contact us about this thesis. If you need to make a general enquiry, please see the Contact us page.

Actions (repository staff only: login required)