Robust methods in data mining

Abstract

The thesis focuses on two problems in Data Mining, namely clustering, an exploratory technique to group observations in similar groups, and classification, a technique used to assign new observations to one of the known groups. A thorough study of the two problems, which are also known in the Machine Learning literature as unsupervised and supervised classification respectively, is central to decision making in different fields - the thesis seeks to contribute towards that end.

In the first part of the thesis we consider whether robust methods can be applied to clustering - in particular, we perform clustering on fuzzy data using two methods
originally developed for outlier-detection. The fuzzy data clusters are characterised by two intersecting lines such that points belonging to the same cluster lie close to
the same line. This part of the thesis also investigates a new application of finite mixture of normals to the fuzzy data problem.

The second part of the thesis addresses issues relating to classification - in particular, classification trees and boosting. The boosting algorithm is a relative newcomer
to the classification portfolio that seeks to enhance the performance of classifiers by iteratively re-weighting the data according to their previous classification status.
We explore the performance of "boosted" trees (mainly stumps) based on 3 different models all characterised by a sine-wave boundary. We also carry out a thorough study of the factors that affect the boosting algorithm.

Other results include a new look at the concept of randomness in the classification context, particularly because the form of randomness in both training and testing
data has directly affects the accuracy and reliability of domain- partitioning rules. Further, we provide statistical interpretations of some of the classification-related
concepts, originally used in Computer Science, Machine Learning and Artificial Intelligence. This is important since there exists a need for a unified interpretation of
some of the "landmark" concepts in various disciplines, as a step forward towards seeking the principles that can guide and strengthen practical applications.

Metadata

Supervisors:	Taylor, C.C. and Kent, J.T.
Awarding institution:	University of Leeds
Academic Units:	The University of Leeds > Faculty of Maths and Physical Sciences (Leeds) > School of Mathematics (Leeds) > Statistics (Leeds)
Identification Number/EthosID:	uk.bl.ethos.400882
Depositing User:	Ethos Import
Date Deposited:	19 Apr 2010 11:04
Last Modified:	06 Mar 2014 16:53
Open Archives Initiative ID (OAI ID):	oai:etheses.whiterose.ac.uk:807

Download

uk_bl_ethos_400882

Filename: uk_bl_ethos_400882.pdf

CLICK TO DOWNLOAD

You do not need to contact us to get a copy of this thesis. Please use the 'Download' link(s) above to get a copy.
You can contact us about this thesis. If you need to make a general enquiry, please see the Contact us page.

Robust methods in data mining

Abstract

Metadata

Download

uk_bl_ethos_400882

Export

Statistics