White Rose University Consortium logo
University of Leeds logo University of Sheffield logo York University logo

Minimally Supervised Techniques for Bilingual Lexicon Extraction

Ismail, Azniah Binti (2012) Minimally Supervised Techniques for Bilingual Lexicon Extraction. PhD thesis, University of York.

[img]
Preview
Text
Thesis_Azniah.pdf
Available under License Creative Commons Attribution-Noncommercial-No Derivative Works 2.0 UK: England & Wales.

Download (4Mb)

Abstract

Normally, word translations are extracted from non-parallel, bilingual corpora, and initial bilingual lexicon, i.e., a list of known translations, is typically used to aid the learning process. This thesis highlights the study of a series of novel techniques that utilized scarce resources. To make the study even more challenging, only minimal use of resources was allowed and important major linguistic tools were not employed. Thus, this study introduces some novel techniques for learning a translation lexicon based on a minimally-supervised, context-based approach. The performance of each technique was measured by comparing the extracted lexicon to a reference lexicon based on the F1 score, which is a weighted average of the precision and the recall. The scores may range from 0 (worst) to 100% (best). Analysis performed on the proposed techniques showed that these techniques had recorded promising F1 scores, ranging from 57.1% to 80.9%, which indicate moderate and best performances. Overall, the �findings of this study further reinforce the use of techniques in exploiting words from small corpora, suggesting that words that are contextually-relevant and occurring in a similar domain are potentially useful. This thesis also presents a technique to deploy extra (i.e., additional) data, which are harvested from the web, and a novel method for measuring similarity of features between two words of different languages without involving the use of initial bilingual lexicon.

Item Type: Thesis (PhD)
Academic Units: The University of York > Computer Science (York)
Identification Number/EthosID: uk.bl.ethos.572381
Depositing User: Ms. Azniah Binti Ismail
Date Deposited: 29 May 2013 13:15
Last Modified: 08 Sep 2016 13:02
URI: http://etheses.whiterose.ac.uk/id/eprint/3964

You do not need to contact us to get a copy of this thesis. Please use the 'Download' link(s) above to get a copy.
You can contact us about this thesis. If you need to make a general enquiry, please see the Contact us page.

Actions (repository staff only: login required)