Borg Inguanez, Monique (2015) Regularization in Regression : Partial Least Squares and Related Models. PhD thesis, University of Leeds.
Abstract
High-dimensional data sets with a large number of explanatory variables are increasingly
important in applications of regression analysis. It is well known that most traditional
statistical techniques, such as the Ordinary Least Square (OLS) estimation do not
perform well with such data and are either ill-conditioned or undefined. Thus a need
for regularization arises. In the literature, various regularization methods have been
suggested; amongst the most famous is the Partial Least Squares (PLS) regression method.
The aim of this thesis is to consolidate and extend results in the literature to (a) show
that PLS estimation can be regarded as estimation under a statistical model based on
the so-called “Krylov hypothesis”, (b) introduce a derivation of the PLS estimator as
an approximate maximum likelihood estimator under this model and (c) propose an
algorithm to modify the PLS estimator to yield an exact maximum likelihood estimator
under the same model.
It will be shown that the constrained optimization problem in (c) can be recast as an
unconstrained optimization problem on the Grassmann manifold. Two simulation studies
consisting of a number of examples (using artificial data) in low dimensions will be
presented. These allow us to make a visual inspection of the Krylov maximum likelihood
as it varies over the Grassmann manifolds and hence characteristics of the data for which
KML can be expected to give better results than PLS can be identified. However it was
observed that these ideas make sense only when there is a small number of explanatory
variables. As soon as the number of explanatory variables is moderate (say p = 10)
or of order thousands, exploring how the different parameters effect the behaviour of
the objective function is not straight forward. The predictive ability of the Ordinary
Least Squares (OLS), Partial Least Squares (PLS) and Krylov Maximum Likelihood
(KML) regression methods when applied to artificial data (for which the sample size
is bigger than the number of explanatory variables) with and without multicollinearity
is explored. Finally the predictive ability of the Partial Least Squares (PLS) and Krylov
Maximum Likelihood (KML) regression methods was also compared on two real life high-dimensional data sets from the literature.
Metadata
Supervisors: | Kent , John T. |
---|---|
Related URLs: | |
Keywords: | Regularization, Regression, Partial Least Squares, Grassmann Manifolds, Krylov Maximum Likelihood |
Awarding institution: | University of Leeds |
Academic Units: | The University of Leeds > Faculty of Maths and Physical Sciences (Leeds) > School of Mathematics (Leeds) > Statistics (Leeds) |
Identification Number/EthosID: | uk.bl.ethos.685244 |
Depositing User: | Dr. Monique Borg Inguanez |
Date Deposited: | 16 May 2016 09:23 |
Last Modified: | 06 Oct 2016 14:42 |
Open Archives Initiative ID (OAI ID): | oai:etheses.whiterose.ac.uk:12957 |
Download
Final eThesis - complete (pdf)
Filename: Borg Inguanez_M_Statistics_PhD_2015.pdf
Licence:
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 2.5 License
Export
Statistics
You do not need to contact us to get a copy of this thesis. Please use the 'Download' link(s) above to get a copy.
You can contact us about this thesis. If you need to make a general enquiry, please see the Contact us page.