White Rose University Consortium logo
University of Leeds logo University of Sheffield logo York University logo

Regularization in Regression : Partial Least Squares and Related Models

Borg Inguanez, Monique (2015) Regularization in Regression : Partial Least Squares and Related Models. PhD thesis, University of Leeds.

Borg Inguanez_M_Statistics_PhD_2015.pdf - Final eThesis - complete (pdf)
Available under License Creative Commons Attribution-Noncommercial-Share Alike 2.0 UK: England & Wales.

Download (24Mb) | Preview


High-dimensional data sets with a large number of explanatory variables are increasingly important in applications of regression analysis. It is well known that most traditional statistical techniques, such as the Ordinary Least Square (OLS) estimation do not perform well with such data and are either ill-conditioned or undefined. Thus a need for regularization arises. In the literature, various regularization methods have been suggested; amongst the most famous is the Partial Least Squares (PLS) regression method. The aim of this thesis is to consolidate and extend results in the literature to (a) show that PLS estimation can be regarded as estimation under a statistical model based on the so-called “Krylov hypothesis”, (b) introduce a derivation of the PLS estimator as an approximate maximum likelihood estimator under this model and (c) propose an algorithm to modify the PLS estimator to yield an exact maximum likelihood estimator under the same model. It will be shown that the constrained optimization problem in (c) can be recast as an unconstrained optimization problem on the Grassmann manifold. Two simulation studies consisting of a number of examples (using artificial data) in low dimensions will be presented. These allow us to make a visual inspection of the Krylov maximum likelihood as it varies over the Grassmann manifolds and hence characteristics of the data for which KML can be expected to give better results than PLS can be identified. However it was observed that these ideas make sense only when there is a small number of explanatory variables. As soon as the number of explanatory variables is moderate (say p = 10) or of order thousands, exploring how the different parameters effect the behaviour of the objective function is not straight forward. The predictive ability of the Ordinary Least Squares (OLS), Partial Least Squares (PLS) and Krylov Maximum Likelihood (KML) regression methods when applied to artificial data (for which the sample size is bigger than the number of explanatory variables) with and without multicollinearity is explored. Finally the predictive ability of the Partial Least Squares (PLS) and Krylov Maximum Likelihood (KML) regression methods was also compared on two real life high-dimensional data sets from the literature.

Item Type: Thesis (PhD)
Related URLs:
Keywords: Regularization, Regression, Partial Least Squares, Grassmann Manifolds, Krylov Maximum Likelihood
Academic Units: The University of Leeds > Faculty of Maths and Physical Sciences (Leeds) > School of Mathematics (Leeds) > Statistics (Leeds)
Identification Number/EthosID: uk.bl.ethos.685244
Depositing User: Dr. Monique Borg Inguanez
Date Deposited: 16 May 2016 09:23
Last Modified: 06 Oct 2016 14:42
URI: http://etheses.whiterose.ac.uk/id/eprint/12957

You do not need to contact us to get a copy of this thesis. Please use the 'Download' link(s) above to get a copy.
You can contact us about this thesis. If you need to make a general enquiry, please see the Contact us page.

Actions (repository staff only: login required)