White Rose University Consortium logo
University of Leeds logo University of Sheffield logo York University logo

Unsupervised morpheme segmentation in a non-parametric Bayesian framework

Basnet, Santa B. (2011) Unsupervised morpheme segmentation in a non-parametric Bayesian framework. MSc by research thesis, University of York.

Available under License Creative Commons Attribution-Noncommercial-No Derivative Works 2.0 UK: England & Wales.

Download (581Kb)


Learning morphemes from any plain text is an emerging research area in the natural language processing. Knowledge about the process of word formation is helpful in devising automatic segmentation of words into their constituent morphemes. This thesis applies unsupervised morpheme induction method, based on the statistical behavior of words, to induce morphemes for word segmentation. The morpheme cache for the purpose is based on the Dirichlet Process (DP) and stores frequency information of the induced morphemes and their occurrences in a Zipfian distribution. This thesis uses a number of empirical, morpheme-level grammar models to classify the induced morphemes under the labels prefix, stem and suffix. These grammar models capture the different structural relationships among the morphemes. Furthermore, the morphemic categorization reduces the problems of over segmentation. The output of the strategy demonstrates a significant improvement on the baseline system. Finally, the thesis measures the performance of the unsupervised morphology learning system for Nepali.

Item Type: Thesis (MSc by research)
Academic Units: The University of York > Computer Science (York)
Depositing User: Mr Santa B. Basnet
Date Deposited: 29 May 2013 14:04
Last Modified: 08 Aug 2013 08:53
URI: http://etheses.whiterose.ac.uk/id/eprint/3918

You do not need to contact us to get a copy of this thesis. Please use the 'Download' link(s) above to get a copy.
You can contact us about this thesis. If you need to make a general enquiry, please see the Contact us page.

Actions (repository staff only: login required)