Fernando, Samuel (2013) Enriching Lexical Knowledge Bases with Encyclopedic Relations. PhD thesis, University of Sheffield.
Abstract
Lexical knowledge bases, such as WordNet, have been shown to be useful in a wide range of language processing applications. However WordNet lacks certain information, such as topical relations between synsets. This thesis addresses this problem by enriching WordNet using information derived from Wikipedia.
The approach consists of mapping concepts in WordNet to corresponding articles in Wikipedia. This is done using a three stage approach. First a set of possible candidate articles is retrieved for each WordNet concept. This is done by searching using the article title, and also by searching the full text using an IR engine. Secondly, text similarity scores are used to select the best match from the candidate articles. Finally, the mappings are refined using information from Wikipedia links to give a set of high quality matches.
The mappings are evaluated using a manually annotated gold standard set of synset-article mappings. The annotation process indicates that the majority of synsets have a good matching article. The refined mappings are shown to have precision of 88.2\%.
The mappings are then used to enrich relations in WordNet using Wikipedia links. The enriched WordNet is then used with a knowledge based Word Sense Disambiguation system. Evaluations are performed on the Semcor 3.0 corpus. Adding the new relations improves performance significantly over the WordNet baseline, demonstrating the usefulness of the mappings on an extrinsic task.
Metadata
Supervisors: | Stevenson, Mark |
---|---|
Keywords: | lexical knowledge, Wikipedia, WordNet, word sense disambiguation |
Awarding institution: | University of Sheffield |
Academic Units: | The University of Sheffield > Faculty of Engineering (Sheffield) > Computer Science (Sheffield) The University of Sheffield > Faculty of Science (Sheffield) > Computer Science (Sheffield) |
Identification Number/EthosID: | uk.bl.ethos.574079 |
Depositing User: | Mr Samuel Fernando |
Date Deposited: | 18 Jun 2013 10:31 |
Last Modified: | 03 Oct 2016 10:39 |
Open Archives Initiative ID (OAI ID): | oai:etheses.whiterose.ac.uk:4081 |
Download
thesisFull
Filename: thesisFull.pdf
Licence:
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivs 2.5 License
Export
Statistics
You do not need to contact us to get a copy of this thesis. Please use the 'Download' link(s) above to get a copy.
You can contact us about this thesis. If you need to make a general enquiry, please see the Contact us page.