Alosaimy, Abdulrahman Mohammed S (2018) Ensemble Morphosyntactic Analyser for Classical Arabic. PhD thesis, University of Leeds.
Abstract
Classical Arabic (CA) is an influential language for Muslim lives around the
world. It is the language of two sources of Islamic laws: the Quran and the Sunnah,
the collection of traditions and sayings attributed to the prophet Mohammed.
However, classical Arabic in general, and the Sunnah, in particular, is underexplored and under-resourced in the field of computational linguistics. This study examines the possible directions for adapting existing tools, specifically morphological analysers, designed for modern standard Arabic (MSA) to classical Arabic.
Morphological analysers of CA are limited, as well as the data for evaluating them. In this study, we adapt existing analysers and create a validation data-set from
the Sunnah books. Inspired by the advances in deep learning and the promising
results of ensemble methods, we developed a systematic method for transferring
morphological analysis that is capable of handling different labelling systems and
various sequence lengths.
In this study, we handpicked the best four open access MSA morphological analysers. Data generated from these analysers are evaluated before and after adaptation through the existing Quranic Corpus and the Sunnah Arabic Corpus. The findings are as follows: first, it is feasible to analyse under-resourced languages using existing comparable language resources given a small sufficient set of annotated text. Second, analysers typically generate different errors and this could be exploited. Third, an explicit alignment of sequences and the mapping of labels is not necessary to achieve comparable accuracies given a sufficient size of training dataset.
Adapting existing tools is easier than creating tools from scratch. The resulting quality is dependent on training data size and number and quality of input taggers. Pipeline architecture performs less well than the End-to-End neural network architecture due to error propagation and limitation on the output format. A valuable tool and data for annotating classical Arabic is made freely available.
Metadata
Supervisors: | Atwell, Eric |
---|---|
Related URLs: |
|
Keywords: | Ensemble Morphological analysis Classical Arabic Sunnah Deep learning Pos tagging |
Awarding institution: | University of Leeds |
Academic Units: | The University of Leeds > Faculty of Engineering (Leeds) > School of Computing (Leeds) |
Identification Number/EthosID: | uk.bl.ethos.759822 |
Depositing User: | Abdulrahman Alosaimy |
Date Deposited: | 03 Dec 2018 12:14 |
Last Modified: | 18 Feb 2020 12:32 |
Open Archives Initiative ID (OAI ID): | oai:etheses.whiterose.ac.uk:22359 |
Download
Final eThesis - complete (pdf)
Filename: alosaimy18thesisV76.pdf
Licence:
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 2.5 License
Related datasets
Export
Statistics
You do not need to contact us to get a copy of this thesis. Please use the 'Download' link(s) above to get a copy.
You can contact us about this thesis. If you need to make a general enquiry, please see the Contact us page.