Alqaisi, Taghreed (2023) Dependency-based Bilingual Word Embeddings and Neural Machine Translation. PhD thesis, University of York.
Abstract
Bilingual word embeddings, which represent lexicons from various languages in a
common embedding space, are critical for facilitating semantic and knowledge trans-
fers in a wide range of cross-lingual NLP applications. The significance of learning
bilingual word embedding representations in many Natural Language Processing
(NLP) tasks motivates us to investigate the effect of many factors, including syntac-
tical information, on the learning process for different languages with varying levels
of structural complexity. By analysing the components that influence the learning
process of bilingual word embeddings (BWEs), this thesis examines some factors for
learning bilingual word embeddings effectively. Our findings in this thesis demon-
strate that increasing the embedding size for language pairs has a positive impact
on the learning process for BWEs. While sentence length depends on the language.
Short sentences perform better than long ones in the En-ES experiment. However,
by increasing the sentence, En-Ar and En-De experiment achieve improved model
accuracy. Arabic segmentation, according to En-Ar experiments, is essential to the
learning process for BWEs and can boost model accuracy by up to 10%.
Incorporating dependency features into the learning process enhances the trained
models performance and results in more improved BWEs in all language pairs.
Finally, we investigated how the dependancy-based pretrained BWEs affected the
neural machine translation (NMT) model. The findings indicate that in various
MT evaluation matrices, the trained dependancy-based NMT models outperform
the baseline NMT model.
Metadata
Supervisors: | Simon, Okeefe |
---|---|
Related URLs: |
|
Keywords: | Word embeddings, dependancy-based bilingual word embeddings, syntax features, crosslingual word embeddings. |
Awarding institution: | University of York |
Academic Units: | The University of York > Computer Science (York) |
Identification Number/EthosID: | uk.bl.ethos.875120 |
Depositing User: | Ms Taghreed Alqaisi |
Date Deposited: | 17 Mar 2023 10:58 |
Last Modified: | 21 Apr 2023 09:53 |
Open Archives Initiative ID (OAI ID): | oai:etheses.whiterose.ac.uk:32468 |
Download
Examined Thesis (PDF)
Filename: ALQAISI_202057689_correctedThesisClean (2).pdf
Licence:
This work is licensed under a Creative Commons Attribution NonCommercial NoDerivatives 4.0 International License
Export
Statistics
You do not need to contact us to get a copy of this thesis. Please use the 'Download' link(s) above to get a copy.
You can contact us about this thesis. If you need to make a general enquiry, please see the Contact us page.