Steele, David (2019) Discourse Cohesion in Chinese-English Statistical Machine Translation. PhD thesis, University of Sheffield.
Abstract
In discourse, cohesion is a required component of meaningful and well organised text.
It establishes the relationship between different elements in the text using a number of
devices such as pronouns, determiners, and conjunctions.
In translation a well translated document will display the correct cohesion and use of
cohesive devices that are pertinent to the language. However, not all languages have the
same cohesive devices or use them in the same way. In statistical machine translation
this is a particular barrier to generating smooth translations, especially when sentences in
parallel corpora are being treated in isolation and no extra meaning or cohesive context is
provided beyond the sentential level.
In this thesis, focussing on Chinese 1 and English as the language pair, we examine
discourse cohesion in statistical machine translation looking at ways that systems can leverage discourse cues and signals in order to produce smoother translations. We also provide a statistical model that improves translation output by adding additional tokens within text that can be used to leverage extra information.
A significant part of this research involved visualising many of the results and system outputs, and so an overview of two important pieces of visualisation software that we
developed is also included.
Metadata
Supervisors: | Specia, Lucia |
---|---|
Awarding institution: | University of Sheffield |
Academic Units: | The University of Sheffield > Faculty of Engineering (Sheffield) > Computer Science (Sheffield) The University of Sheffield > Faculty of Science (Sheffield) > Computer Science (Sheffield) |
Identification Number/EthosID: | uk.bl.ethos.792041 |
Depositing User: | Mr David Steele |
Date Deposited: | 02 Dec 2019 09:13 |
Last Modified: | 23 Dec 2019 11:05 |
Open Archives Initiative ID (OAI ID): | oai:etheses.whiterose.ac.uk:25051 |
Download
david_steele_phd_thesis_23-09-19
Filename: david_steele_phd_thesis_23-09-19.pdf
Licence:
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivs 2.5 License
Export
Statistics
You do not need to contact us to get a copy of this thesis. Please use the 'Download' link(s) above to get a copy.
You can contact us about this thesis. If you need to make a general enquiry, please see the Contact us page.