Alsaleh, Abdullah Nassir A (2025) Transformer-based Semantic Similarity Exploration on the Holy Quran. PhD thesis, University of Leeds.
Abstract
This PhD research explores the application of modern Natural Language Processing (NLP) techniques to the study of the Holy Quran, with a focus on semantic understanding. It addresses the challenges of working with Classical Arabic and explores how Transformer-based Arabic language models can be used to better understand relationships between Quranic verses, answer questions, and retrieve relevant passages.
The thesis makes four key contributions. First, it evaluates QurSim, a semantic similarity corpus in the Quran, and produces a cleaner version of the QurSim dataset to support more reliable experiments. Second, it applies Arabic pre-trained language models to three semantically related tasks: semantic similarity, question answering and passage retrieval, to demonstrate their potential and limitations in handling religious text. Third, it outlines the methods and strategies for tackling the tasks, identifying the most effective approaches to understanding the Quranic text.
The findings of this PhD thesis demonstrate two main contributions. First, the Arabic pre-trained Transformer-based language models can be effectively applied to Quranic text semantic tasks, although their performance varies depending on the nature of the task. Second, the thesis highlights the need for a new semantic similarity corpus of the Holy Quran, grounded in a Quranic exegesis that interprets the Quran through the Quran itself. These contributions advance the field of NLP for the Holy Quran in particular and Classical Arabic in general, providing tools and resources that open new pathways for the computational linguistics of religious text.
Metadata
Supervisors: | Atwell, Eric and Altahhan, Abdulrahman |
---|---|
Related URLs: |
|
Keywords: | Quran; Semantic Similarity; BERT; Transformer; Question Answering; Passage Retrieval; Classical Arabic; Quranic Corpus |
Awarding institution: | University of Leeds |
Academic Units: | The University of Leeds > Faculty of Engineering (Leeds) |
Academic unit: | School of Computer Science |
Date Deposited: | 01 Oct 2025 08:33 |
Last Modified: | 01 Oct 2025 08:33 |
Open Archives Initiative ID (OAI ID): | oai:etheses.whiterose.ac.uk:37334 |
Download
Final eThesis - complete (pdf)
Embargoed until: 1 September 2026
Please use the button below to request a copy.
Filename: ALSALEH_Abdullah_Thesis.pdf

Export
Statistics
Please use the 'Request a copy' link(s) in the 'Downloads' section above to request this thesis. This will be sent directly to someone who may authorise access.
You can contact us about this thesis. If you need to make a general enquiry, please see the Contact us page.