Exploring Syntactic Representations in Pre-trained Transformers to Improve Neural Machine Translation by a Fusion of Neural Network Architectures

Abstract

Neural networks in Machine Translation (MT) engines may not consider deep linguistic knowledge, often resulting in low-quality translations. In order to improve translation quality, this study examines the feasibility of fusing two data augmentation strategies: the explicit syntactic knowledge incorporation and the pre-trained language model BERT.

The study first investigates what BERT knows about syntactic knowledge of the source language sentences before and after MT fine-tuning through syntactic probing experiments, as well as using a Quality Estimation (QE) model and the chi-square test to clarify the correlation between syntactic knowledge of the source language sentences and the quality of translations in the target language. The experimental results show that BERT can explicitly predict different types of dependency relations in source language sentences and exhibit different learning trends, which probes can reveal. Moreover, experiments confirm a correlation between dependency relations in source language sentences and translation quality in MT scenarios, which can somewhat influence translation quality. The dependency relations of the source language sentences frequently appear in low-quality translations are detected. Probes can be linked to those dependency relations, where prediction scores of dependency relations tend to be higher in the middle layer of BERT than those in the top layer.

The study then presents dependency relation prediction experiments to examine what a Graph Attention Network (GAT) learns syntactic dependencies and investigates how it learns such knowledge by different pairs of the number of attention heads and model layers. Additionally, the study examines the potential of incorporating GAT-based syntactic predictions in MT scenarios by comparing GAT with fine-tuned BERT in dependency relations prediction. Based on the paired t-test and prediction scores, GAT outperforms MT-B, a version of BERT specifically fine-tuned for MT. GAT exhibits higher prediction scores for the majority of dependency relations. For some dependency relations, it even outperforms UD-B, a version of BERT specifically fine-tuned for syntactic dependencies. However, GAT faces difficulties in predicting accurately by the quantity and subtype of dependency relations, which can lead to lower prediction scores.

Finally, the study proposes a novel MT architecture of Syntactic knowledge via Graph attention with BERT (SGB) engines and examines how the translation quality changes from various perspectives. The experimental results indicate that the SGB engines can improve low-quality translations across different source language sentence lengths and better recognize the syntactic structure defined by dependency relations of source language sentences based on the QE scores. However, improving translation quality relies on BERT correctly modeling the source language sentences. Otherwise, the syntactic knowledge on the graphs is of limited impact. The prediction scores of GAT for dependency relations can also be linked to improved translation quality. GAT allows some layers of BERT to reconsider the syntactic structures of the source language sentences. Using XLM-R instead of BERT still results in improved translation quality, indicating the efficiency of syntactic knowledge on graphs. These experiments not only show the effectiveness of the proposed strategies but also provide explanations, which bring more inspiration for future fusion that graph neural network modeling linguistic knowledge and pre-trained language models in MT scenarios.

Metadata

Supervisors:	Sharoff, Serge and Kamps, Marc de
Keywords:	Natural Language Processing, Machine Translation, Computational Linguistics
Awarding institution:	University of Leeds
Academic Units:	The University of Leeds > Faculty of Arts, Humanities and Cultures (Leeds) > School of Languages Cultures and Societies (Leeds)
Depositing User:	Mr Yuqian Dai
Date Deposited:	29 Jan 2024 14:14
Last Modified:	29 Jan 2024 14:14
Open Archives Initiative ID (OAI ID):	oai:etheses.whiterose.ac.uk:33962

Download

Final eThesis - complete (pdf)

Filename: Yuqian_D_LCS_PhD_2023.pdf

Licence:
This work is licensed under a Creative Commons Attribution NonCommercial ShareAlike 4.0 International License

CLICK TO DOWNLOAD

[thumbnail of Yuqian_D_LCS_PhD_2023.pdf]

You do not need to contact us to get a copy of this thesis. Please use the 'Download' link(s) above to get a copy.
You can contact us about this thesis. If you need to make a general enquiry, please see the Contact us page.

Exploring Syntactic Representations in Pre-trained Transformers to Improve Neural Machine Translation by a Fusion of Neural Network Architectures

Abstract

Metadata

Download

Final eThesis - complete (pdf)

Export

Statistics