White Rose University Consortium logo
University of Leeds logo University of Sheffield logo York University logo

Machine Learning of Antonyms in English and Arabic Corpora

Aldhubayi, Luluh Basim M. (2019) Machine Learning of Antonyms in English and Arabic Corpora. PhD thesis, University of Leeds.

L_ALDHUBAYI_PHD_2019.pdf - Final eThesis - complete (pdf)
Available under License Creative Commons Attribution-Noncommercial-Share Alike 2.0 UK: England & Wales.

Download (2389Kb) | Preview


Identifying lexical semantic relations in the text has been a long-standing dream of artificial intelligence and the target of many researchers' attention over the past years. This thesis addresses the problem of identifying antonymy relations, such as(hot/cold) in an automatic method. This work presents three key points in capturing antonymy word pairs: extracting word pairs examples from a textual corpus, representing antonymy in a pair vector space model, and using a machine learning classifier to predict the antonymy relation. Researchers have found that discriminating antonymy from synonymy is a non-trivial task. Both relations show similar semantic distributions as they are found in similar contexts. This issue affects many similarity-based applications by displaying opposite words instead of synonyms. Moreover, both traditional and modern vector space models such as Bag-of-Words and Word Embeddings models show poor discrimination between antonymy and synonymy words. Therefore, this work proposed antonymy pair vector representation based on symmetric classified patterns extracted from a corpus. Besides, we are motivated by extracting novel antonymy and opposites relations between word pairs. This research aims to capture and predict antonymy pairs generated by a textual corpus to make computers able to understand and capture opposition relation in the text. Our research proposes the Antonymy classifier which combines two approaches: the pattern-based approach and a machine learning classifier. We use the pattern-based approach to extract word pairs and patterns. We also propose using distant supervision learning to label the extracted pairs automatically. Distant supervision uses an external knowledge base (the Open Multilingual WordNet) to generate positive and negative antonymy instances. It also extracts every sentence from a corpus which shows both canonical antonymy pairs such as(national/international)and non-canonical antonymy or opposites pairs such as (internal/international) that might provide statistical evidence for an antonymy relation. In addition, this work presents a pattern classifier model which automatically extracts and classifies antonymy patterns by computing the average co-occurrence association between positive(antonymy) and negative (non-antonymy)instances in the training set. A part of these patterns such as (between X and Y, both X and Y, from X to Y ) were found in related linguistic studies on manual patterns extraction and analysis. We also found some novel textual patterns that are highly associated with antonymy pairs such as (however X or Y, what is X and what is Y) and more. This work also shows experiments in extracting and predicting antonymy on the English BNC and SkELL corpora and the Arabic ArTenTen corpus. The overall outcomes showed a positive prediction improvement in distinguishing antonym pairs compared to previous attempts. Also, we presented new antonymy pairs that are not found in the English and Arabic WordNet. The antonymy classifier model uses a machine learning algorithm to extract and classify novel adjectival and noun antonymous pairs such as (verbal/visual), (input/output), (life/death) and(material/spiritual). Therefore, the work presented in this research is a promising method for better extraction and classification of antonymy pairs and patterns in a corpus.

Item Type: Thesis (PhD)
Keywords: antonyms, synonyms, Sketch Engine, CQL, pattern, WordNet, vector space model, machine learning, relation extraction, antonym pairs, Arabic antonyms, English Antonyms, BNC, SkELL corpus, ArTenTen corpus.
Academic Units: The University of Leeds > Faculty of Engineering (Leeds) > School of Computing (Leeds)
Identification Number/EthosID: uk.bl.ethos.778681
Depositing User: Mrs Luluh Aldhubayi
Date Deposited: 26 Jun 2019 09:24
Last Modified: 18 Feb 2020 12:50
URI: http://etheses.whiterose.ac.uk/id/eprint/23975

You do not need to contact us to get a copy of this thesis. Please use the 'Download' link(s) above to get a copy.
You can contact us about this thesis. If you need to make a general enquiry, please see the Contact us page.

Actions (repository staff only: login required)