Alqurashi, Lama (2024) Investigating Authorship in Classical Arabic Poetry Using Large Language Models. PhD thesis, University of Leeds.
Abstract
This study investigates authorship attribution in Arabic poetry using the entire Classic Arabic Poetry corpus for the first time. Authorship attribution in Arabic poetry dates back to the 6th century during the pre-Islamic period when oral recitation was the primary method of preserving and disseminating poems. Limited written documentation, mainly for treaties, resulted in the loss of much pre-Islamic poetry and the misattribution of post-Islamic poems to pre-Islamic poets. While previous studies have qualitatively explored this issue, this research quantitatively addresses it for the first time.
The study collected and augmented data with metadata to ensure accurate temporal separation. To address potential confusion between style and topic, topic modeling experiments identified five prominent topics, revealing patterns in topic distribution across centuries and poetic meters. Random poems from each century were qualitatively analyzed to validate the topic modeling process.
A classification model was applied to delve deeper into authorship attribution. An ensemble model was developed and tested on applicable data, excluding the pre-Islamic era. The model’s performance was evaluated based on topic, number of poets, and number of examples. Topic segregation slightly improved performance, with optimal results observed when one poet was included in the opposite class. The best performance occurred with 60 examples on average.
After selecting the most effective parameters, the model achieved accuracies of 0.97 to 1.0 and corresponding F1 scores. Misclassifications mostly occurred at probabilities below 90%, while correct classifications approached 100%. These findings demonstrate the model’s robustness and its potential for addressing real cases of misattribution in Arabic poetry.
Metadata
Supervisors: | Sharoff, Serge |
---|---|
Keywords: | LLMs, Arabic, Poetry, Authorship, BERT |
Awarding institution: | University of Leeds |
Academic Units: | The University of Leeds > Faculty of Arts, Humanities and Cultures (Leeds) > School of Languages Cultures and Societies (Leeds) |
Depositing User: | Mrs Lama Alqurashi |
Date Deposited: | 07 Feb 2025 15:06 |
Last Modified: | 07 Feb 2025 15:06 |
Open Archives Initiative ID (OAI ID): | oai:etheses.whiterose.ac.uk:36186 |
Download
Final eThesis - complete (pdf)
Filename: Thesis-corrected-Lama.pdf
Licence:
This work is licensed under a Creative Commons Attribution NonCommercial ShareAlike 4.0 International License
Export
Statistics
You do not need to contact us to get a copy of this thesis. Please use the 'Download' link(s) above to get a copy.
You can contact us about this thesis. If you need to make a general enquiry, please see the Contact us page.