White Rose University Consortium logo
University of Leeds logo University of Sheffield logo York University logo

Human and automatic annotation of discourse relations for Arabic

Alsaif, Amal (2012) Human and automatic annotation of discourse relations for Arabic. PhD thesis, University of Leeds.

Available under License Creative Commons Attribution-Noncommercial-Share Alike 2.0 UK: England & Wales.

Download (6Mb)


This thesis describes the first, inter-disciplinary, study on human and automatic discourse annotation for explicit discourse connectives in Modern Standard Arabic (MSA). Discourse connectives are used in language to link discourse segments (arguments) by indicating so-called discourse relations. Automating the process of identifying the discourse connectives, their relations and their arguments is an essential basis for discourse processing studies and applications. This study presents several resources for Arabic discourse processing in addition to the first machine learning algorithms for identifying explicit discourse connectives and relations automatically. First, we have collected a large list of discourse connectives frequently used in MSA. This collection is used to develop the READ tool: the first annotation tool to fit the characteristics of Arabic, so that Arabic texts can be annotated by humans for discourse structure. Second, our analysis of Arabic discourse connectives leads to formalize an annotation scheme for connectives in context, based on a popular discourse annotation project for English, the PDTB project. Third, we used this scheme to create the first discourse corpus for Arabic, the Leeds Arabic Discourse Treebank (LADTB v.1). The LADTB extends the syntactic annotation of the Arabic Treebank Part1 to incorporate the discourse layer, by annotating all explicit connectives as well as associated relations and arguments. We show that the LADTB annotation is reliable and produce a gold standard for future work. Fourth, we develop the first automatic identification models for Arabic discourse connectives and relations, using the LADTB for training and testing. Our connective recogniser achieves almost human performance. Our algorithm for recognizing discourse relations performs significantly better than a baseline based on the connective surface string alone and therefore reduces the ambiguity in explicit connective interpretation. At the end of the thesis, we highlight research trends for future work that can benefit from our resources and algorithms on discourse processing for Arabic.

Item Type: Thesis (PhD)
ISBN: 978-0-85731-237-2
Academic Units: The University of Leeds > Faculty of Engineering (Leeds) > School of Computing (Leeds)
Identification Number/EthosID: uk.bl.ethos.561083
Depositing User: Repository Administrator
Date Deposited: 29 Nov 2012 12:50
Last Modified: 07 Mar 2014 11:24
URI: http://etheses.whiterose.ac.uk/id/eprint/3129

You do not need to contact us to get a copy of this thesis. Please use the 'Download' link(s) above to get a copy.
You can contact us about this thesis. If you need to make a general enquiry, please see the Contact us page.

Actions (repository staff only: login required)