Unsupervised morpheme segmentation in a non-parametric Bayesian framework

Abstract

Learning morphemes from any plain text is an emerging research area in the natural language processing. Knowledge about the process of word formation is helpful in devising automatic segmentation of words into their constituent morphemes. This thesis applies unsupervised morpheme induction method, based on the statistical behavior of words, to induce morphemes for word segmentation. The morpheme cache for the purpose is based on the Dirichlet Process (DP) and stores frequency information of the induced morphemes and their occurrences in a Zipfian distribution.

This thesis uses a number of empirical, morpheme-level grammar models to classify the induced morphemes under the labels prefix, stem and suffix. These grammar models capture the different structural relationships among the morphemes. Furthermore, the morphemic categorization reduces the problems of over segmentation. The output of the strategy demonstrates a significant improvement on the baseline system.

Finally, the thesis measures the performance of the unsupervised morphology learning system for Nepali.

Metadata

Supervisors:	Manandhar, Suresh
Awarding institution:	University of York
Academic Units:	The University of York > Computer Science (York)
Depositing User:	Mr Santa B. Basnet
Date Deposited:	29 May 2013 14:04
Last Modified:	08 Feb 2022 23:09
Open Archives Initiative ID (OAI ID):	oai:etheses.whiterose.ac.uk:3918

Download

MSc_Thesis_Submitted_-_Santa

Filename: MSc_Thesis_Submitted_-_Santa.pdf

Licence:
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivs 2.5 License

CLICK TO DOWNLOAD

[thumbnail of MSc_Thesis_Submitted_-_Santa.pdf]

You do not need to contact us to get a copy of this thesis. Please use the 'Download' link(s) above to get a copy.
You can contact us about this thesis. If you need to make a general enquiry, please see the Contact us page.

Unsupervised morpheme segmentation in a non-parametric Bayesian framework

Abstract

Metadata

Download

MSc_Thesis_Submitted_-_Santa

Export

Statistics