White Rose University Consortium logo
University of Leeds logo University of Sheffield logo York University logo

Computational Analysis of Morphosyntactic Categories in Georgian.

Daraselia, Sophiko (2019) Computational Analysis of Morphosyntactic Categories in Georgian. PhD thesis, University of Leeds.

[img] Text (Computational Analysis of Morphosyntactic Categories in Georgian)
Sophiko_thesis_2019.pdf - Final eThesis - complete (pdf)
Restricted until 1 December 2020.

Request a copy

Abstract

This thesis describes the development of part-of-speech tagging resources for the Georgian language, consisting of i.) a new morphosyntactic language model for part-of-speech (POS) tagging purposes; ii.) tagging guidelines for tagging and post-editing; iii.) the KATAG tagset and iv.) the trained parameter files the probabilistic TreeTagger program needs to work on Georgian texts. A new morphosyntactic model of Georgian for part-of-speech tagging purposes is described in the thesis. The thesis also describes a tagset (KATAG) defined in accordance with a new morphosyntactic model of the language and a set of design principles and tagging guidelines. A stochastic methodology is used here to perform tagging in Georgian. Namely, the Treetagger - a probabilistic part-of-speech tagging program has been trained on Georgian texts. The justification for this choice is discussed. I use two tokenisation approaches in part-of-speech tagging. An accuracy of 92.41% using an enclitic tokenisation approach and accuracy of 87.13% was achieved using a non-enclitic tokenisation approach, corroborating my hypothesis that treating enclitic elements separately from the host words results in better tagging performance. To make the tagger program easily adaptable for a range of inputs (type, variety or genre of text), the performance of the probabilistic TreeTagger program was evaluated according to the obtained test set consisting of five different genres such as academic, informal, legal, fiction and news.

Item Type: Thesis (PhD)
Keywords: Part-of-speech tagging, morphosyntactic annotation, corpus linguistics
Academic Units: The University of Leeds > Faculty of Arts, Humanities and Cultures (Leeds)
The University of Leeds > Faculty of Arts, Humanities and Cultures (Leeds) > School of Languages Cultures and Societies (Leeds)
The University of Leeds > Faculty of Arts, Humanities and Cultures (Leeds) > School of Languages Cultures and Societies (Leeds) > Linguistics & Phonetics (Leeds)
Depositing User: Dr Sophiko Daraselia
Date Deposited: 15 Nov 2019 15:14
Last Modified: 15 Nov 2019 15:14
URI: http://etheses.whiterose.ac.uk/id/eprint/25313

Please use the 'Request a copy' link(s) above to request this thesis. This will be sent directly to someone who may authorise access.
You can contact us about this thesis. If you need to make a general enquiry, please see the Contact us page.

Actions (repository staff only: login required)