Aldihan, Hesah (2024) A Computational Linguistic Approach to Gender Analysis and Classification of Kuwaiti Arabic. PhD thesis, University of Sheffield.
Abstract
This thesis focuses on the collection and computational analysis of Kuwaiti Arabic to test different sociolinguistic hypotheses related to gendered language use in social media. As the Kuwaiti Arabic dialect has some unique linguistic features that are stereotypically associated with gendered language usage, the study adopts a computational approach to study these features and draw insights into the relationship between language and gender in Kuwaiti Arabic. It contributes to the field of Arabic Natural Language Processing by providing three publicly available datasets: the Kuwaiti Arabic Gender-labelled WhatsApp Dataset (KAGen), the Kuwaiti Arabic Conversational Function WhatsApp Dataset (KACD), and the Kuwaiti Arabic Twitter Dataset (KATD).
The thesis unfolds in three main studies. The first study introduces the collection of KAGen, a Kuwaiti Arabic dataset that consists of WhatsApp exchanges of mixed gender Kuwaiti users collected from WhatsApp reading club groups that moved online during COVID-19. This dataset has been used to analyse different interactional and linguistic features to get insights about gender indicative features to inform the development of a gender classification system for Kuwaiti Arabic. The features studied are analysed quantitatively and qualitatively and are tested in a basic gender classification system trained and tested on the dataset.
The second study involves the development of an annotation framework to annotate the KAGen dataset according to the conversational functions employed in the turns. The study adopts an inductive thematic analysis approach to scrutinise the dataset and create the conversational function taglist that is used by annotators along with annotation guidelines to annotate the dataset which we name KACD. To the best of our knowledge, KACD is the only publicly available dataset of conversational Kuwaiti Arabic tagged according to conversational functions. The study provides insights regarding conversational function patterns and presents statistics on the distribution of conversational functions among men and women. Additionally, it offers qualitative observations, shedding light on distinctive linguistic patterns observed in the language used.
The third study aims to use machine learning models to automatically predict the gender of Kuwaiti Arabic social media users. For this study, KATD, a large publicly available dataset of Kuwaiti Arabic tweets is collected and labelled according to the gender of users. Two supervised learning approaches are taken to build the gender classification systems: a feature engineering approach and a deep learning approach. The feature engineering approach is adopted to test different linguistic features including the features inferred from the previous two studies to analyse their performance in predicting the gender of users. The deep learning approach, using pre-trained Transformer models, is also tested to assess how well pre-trained large language models perform in predicting the gender of Kuwaiti Arabic social media users.
Metadata
Supervisors: | Gaizauskas, Robert and Fitzmaurice, Susan |
---|---|
Awarding institution: | University of Sheffield |
Academic Units: | The University of Sheffield > Faculty of Engineering (Sheffield) > Computer Science (Sheffield) The University of Sheffield > Faculty of Science (Sheffield) > Computer Science (Sheffield) |
Depositing User: | Miss Hesah Aldihan |
Date Deposited: | 11 Sep 2024 10:35 |
Last Modified: | 11 Sep 2024 10:35 |
Open Archives Initiative ID (OAI ID): | oai:etheses.whiterose.ac.uk:35527 |
Download
Final eThesis - complete (pdf)
Embargoed until: 11 September 2025
Please use the button below to request a copy.
Filename: phd_thesis_hesah_Final.pdf
Export
Statistics
Please use the 'Request a copy' link(s) in the 'Downloads' section above to request this thesis. This will be sent directly to someone who may authorise access.
You can contact us about this thesis. If you need to make a general enquiry, please see the Contact us page.