Shahid, Ahmad (2012) Extraction of Linguistic Resources from Multilingual Corpora and their Exploitation. PhD thesis, University of York.
Abstract
Increasing availability of on-line and off-line multilingual resources along with the developments in the related automatic tools that can process this information, such as GIZA++ (Och & Ney 2003), has made it possible to build new multilingual resources that can be used for NLP/IR tasks.
Lexicon generation is one such task, which if done by hand is quite expensive with human and capital costs involved. Generation of multilingual lexicons can now be automated, as is done in this research work. Wikipedia, an on-line multilingual resource was gainfully employed to automatically build multilingual lexicons using simple search strategies.
Europarl parallel corpus (Koehn 2002) was used to create multilingual sets of synonyms, that were later used to carry out the task of Word Sense Disambiguation (WSD) on the original corpus from which they were derived. The theoretical analysis of the methodology validated our approach.
The multilingual sets of synonyms were then used to learn unsupervised models of word morphology in the individual languages. The set of experiments we carried out, along with another unsupervised technique, were evaluated against the gold standard. Our results compared very favorably with the other approach. The combination of the two approaches gave even better results.
Metadata
Supervisors: | Kazakov, Dimitar |
---|---|
Awarding institution: | University of York |
Academic Units: | The University of York > Computer Science (York) |
Identification Number/EthosID: | uk.bl.ethos.550283 |
Depositing User: | Mr Ahmad Shahid |
Date Deposited: | 27 Feb 2012 11:00 |
Last Modified: | 08 Sep 2016 12:21 |
Open Archives Initiative ID (OAI ID): | oai:etheses.whiterose.ac.uk:2111 |
Download
Extraction of Linguistic Resources from Multilingual Corpora and their Exploitation
Filename: PhDThesisAhmadRazaShahid.pdf
Description: Extraction of Linguistic Resources from Multilingual Corpora and their Exploitation
Licence:
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivs 2.5 License
Export
Statistics
You do not need to contact us to get a copy of this thesis. Please use the 'Download' link(s) above to get a copy.
You can contact us about this thesis. If you need to make a general enquiry, please see the Contact us page.