Basnet, Santa B. (2011) Unsupervised morpheme segmentation in a non-parametric Bayesian framework. MSc by research thesis, University of York.
Abstract
Learning morphemes from any plain text is an emerging research area in the natural language processing. Knowledge about the process of word formation is helpful in devising automatic segmentation of words into their constituent morphemes. This thesis applies unsupervised morpheme induction method, based on the statistical behavior of words, to induce morphemes for word segmentation. The morpheme cache for the purpose is based on the Dirichlet Process (DP) and stores frequency information of the induced morphemes and their occurrences in a Zipfian distribution.
This thesis uses a number of empirical, morpheme-level grammar models to classify the induced morphemes under the labels prefix, stem and suffix. These grammar models capture the different structural relationships among the morphemes. Furthermore, the morphemic categorization reduces the problems of over segmentation. The output of the strategy demonstrates a significant improvement on the baseline system.
Finally, the thesis measures the performance of the unsupervised morphology learning system for Nepali.
Metadata
Supervisors: | Manandhar, Suresh |
---|---|
Awarding institution: | University of York |
Academic Units: | The University of York > Computer Science (York) |
Depositing User: | Mr Santa B. Basnet |
Date Deposited: | 29 May 2013 14:04 |
Last Modified: | 08 Feb 2022 23:09 |
Open Archives Initiative ID (OAI ID): | oai:etheses.whiterose.ac.uk:3918 |
Download
MSc_Thesis_Submitted_-_Santa
Filename: MSc_Thesis_Submitted_-_Santa.pdf
Licence:
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivs 2.5 License
Export
Statistics
You do not need to contact us to get a copy of this thesis. Please use the 'Download' link(s) above to get a copy.
You can contact us about this thesis. If you need to make a general enquiry, please see the Contact us page.