Feature Selection from Higher Order Correlations

Abstract

This thesis addresses the problems in feature selection,
particularly focusing on selecting features from higher order
correlations. To this end, we present two supervised feature selection
approaches named \emph{Graph based Information-theoretic Feature Selection} and
\emph{Hypergraph based Information-theoretic Feature Selection}
respectively, which are capable of considering third or even higher
order dependencies between the relevant features and capturing the
optimal size of relevant feature subset. Furthermore, we develop two
unsupervised feature selection methods which can evaluate
features jointly rather than individually. In this case, larger
feature combinations are considered. The reason for this is that
although an individual feature may have limited relevance to a
particular class, when taken in combination with other features it
may be strongly relevant to the class.

In Chapter $2$, we thoroughly review the relevant literature of
the classifier independent (filter-based) feature selection methods. One dominant direction of research in
this area is exemplified by the so-called information theoretic
feature selection criteria, which is measuring the mutual dependence
of two variables. Another influential direction is the graph-based
feature selection methods, which are to select the features that
best preserve the data similarity or a manifold structure derived
from the entire feature set. We notice that most existing feature
selection methods evaluate features individually or just simply
consider pairwise feature interaction, and hence cannot handle
redundant features. Another shortcoming of existing feature
selection methods is that most of them select features in a greedy
way and do not provide a direct measure to judge whether to add
additional features or not. To deal with this problem, they require
a user to supply the number of selected features in advance.
However, in real applications, it is hard to estimate the number of
useful features before the feature selection process. This thesis
addresses these weaknesses, and fills a gap in the literature of
selecting features from higher order correlations.

In Chapter $3$ we propose a graph based information-theoretic
approach to feature selection. There are three novel ingredients.
First, by incorporating mutual information (MI) for pairwise feature
similarity measure, we establish a novel feature graph framework
which is used for characterizing the informativeness between the
pair of features. Secondly, we locate the relevant feature subset
(RFS) from the feature graph by maximizing features' average
pairwise relevance. The RFS is expected to have little redundancy
and very strong discriminating power. This strategy reduces the
optimal search space from the original feature set to the relatively
smaller relevant feature subset, and thus enable an efficient
computation. Finally, based on RFS, we evaluate the importance of
unselected features by using a new information theoretic criterion
referred to as the multidimensional interaction information (MII).
The advantage of MII is that it can go beyond pairwise interaction
and consider third or higher order feature interactions. As a
result, we can evaluate features jointly, and thus avoid the
redundancies arising in individual feature combinations.
Experimental results demonstrate the effectiveness of our feature
selection method on a number of standard data-sets.

In Chapter $4$, we find that in some situations the graph
representation for relational patterns can lead to substantial loss
of information. This is because in real-world problems objects and
their features tend to exhibit multiple relationships rather than
simple pairwise ones. This motive us to establish a feature
hypergraph (rather than feature graph) to characterize the multiple
relationships among features. We draw on recent work on hyper-graph
clustering to select the most informative feature subset (mIFS) from
a set of objects using high-order (rather than pairwise)
similarities. There are two novel ingredients. First, we use MII to
measure the significance of different feature combinations with
respect to the class labels. Secondly, we use hypergraph clustering
to select the most informative feature subset (mIFS), which has both
low redundancy and strong discriminating power. The advantage of MII
is that it incorporates third or higher order feature interactions.
Hypergraph clustering, which extracts the most informative features.
The size of the most informative feature subset (mIFS) is determined
automatically. Experimental results demonstrate the effectiveness of
our feature selection method on a number of standard data-sets.

In addition to the supervised feature selection methods, we present two novel
unsupervised feature selection methods in Chapter $5$ and Chapter $6$. Specifically,
we propose a new two-step spectral regression technique for
unsupervised feature selection in Chapter $5$. In the first step, we
use kernel entropy component analysis (kECA) to transform the data
into a lower-dimensional space so as to improve class separation.
Second, we use $\ell_{1}$-norm regularization to select the
features that best align with the data embedding resulting from
kECA. The advantage of kECA is that dimensionality reducing data
transformation maximally preserves entropy estimates for the input
data whilst also best preserving the cluster structure of the data.
Using $\ell_{1}$-norm regularization, we cast feature discriminant
analysis into a regression framework which accommodates the
correlations among features. As a result, we can evaluate joint
feature combinations, rather than being confined to consider them
individually. Experimental results demonstrate the effectiveness of
our feature selection method on a number of standard face data-sets.

In Chapter $6$, by incorporating MII for higher order similarities
measure, we establish a novel hypergraph framework which is used for
characterizing the multiple relationships within a set of samples
(e.g. face samples under varying illumination conditions). Thus, the
structural information latent in the data can be more effectively
modeled. We then explore a strategy to select the discriminating
feature subset on the basis of the hypergraph representation. The
strategy is based on an unsupervised method which derive the
hypergraph embedding view of feature selection. We develop the
strategy based on a number of standard image datasets, and the
results demonstrate the effectiveness of our feature selection
method.

We summarize the contributions of this thesis in Chapter $7$, and
analyze the developed methods. Finally, we give some suggestions to
the future work in feature selection.

Metadata

Supervisors:	Hancock , Edwin
Keywords:	Feature Selection; Graph; Hypergraph;
Awarding institution:	University of York
Academic Units:	The University of York > Computer Science (York)
Identification Number/EthosID:	uk.bl.ethos.568094
Depositing User:	Mr Zhihong Zhang
Date Deposited:	01 Mar 2013 16:33
Last Modified:	08 Sep 2016 13:01
Open Archives Initiative ID (OAI ID):	oai:etheses.whiterose.ac.uk:3340

Download

PhD_thesis-_Zhihong_Zhang

Filename: PhD_thesis-_Zhihong_Zhang.pdf

Licence:
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivs 2.5 License

CLICK TO DOWNLOAD

[thumbnail of PhD_thesis-_Zhihong_Zhang.pdf]

You do not need to contact us to get a copy of this thesis. Please use the 'Download' link(s) above to get a copy.
You can contact us about this thesis. If you need to make a general enquiry, please see the Contact us page.

Feature Selection from Higher Order Correlations

Abstract

Metadata

Download

PhD_thesis-_Zhihong_Zhang

Export

Statistics