White Rose University Consortium logo
University of Leeds logo University of Sheffield logo York University logo

Discourse Cohesion in Chinese-English Statistical Machine Translation

Steele, David (2019) Discourse Cohesion in Chinese-English Statistical Machine Translation. PhD thesis, University of Sheffield.

[img]
Preview
Text
david_steele_phd_thesis_23-09-19.pdf
Available under License Creative Commons Attribution-Noncommercial-No Derivative Works 2.0 UK: England & Wales.

Download (1431Kb) | Preview

Abstract

In discourse, cohesion is a required component of meaningful and well organised text. It establishes the relationship between different elements in the text using a number of devices such as pronouns, determiners, and conjunctions. In translation a well translated document will display the correct cohesion and use of cohesive devices that are pertinent to the language. However, not all languages have the same cohesive devices or use them in the same way. In statistical machine translation this is a particular barrier to generating smooth translations, especially when sentences in parallel corpora are being treated in isolation and no extra meaning or cohesive context is provided beyond the sentential level. In this thesis, focussing on Chinese 1 and English as the language pair, we examine discourse cohesion in statistical machine translation looking at ways that systems can leverage discourse cues and signals in order to produce smoother translations. We also provide a statistical model that improves translation output by adding additional tokens within text that can be used to leverage extra information. A significant part of this research involved visualising many of the results and system outputs, and so an overview of two important pieces of visualisation software that we developed is also included.

Item Type: Thesis (PhD)
Academic Units: The University of Sheffield > Faculty of Engineering (Sheffield) > Computer Science (Sheffield)
The University of Sheffield > Faculty of Science (Sheffield) > Computer Science (Sheffield)
Identification Number/EthosID: uk.bl.ethos.792041
Depositing User: Mr David Steele
Date Deposited: 02 Dec 2019 09:13
Last Modified: 23 Dec 2019 11:05
URI: http://etheses.whiterose.ac.uk/id/eprint/25051

You do not need to contact us to get a copy of this thesis. Please use the 'Download' link(s) above to get a copy.
You can contact us about this thesis. If you need to make a general enquiry, please see the Contact us page.

Actions (repository staff only: login required)