White Rose University Consortium logo
University of Leeds logo University of Sheffield logo York University logo

Feature selection and structure specification in ultra-high dimensional semi-parametric model with an application in medical science

Ke, Yuan (2015) Feature selection and structure specification in ultra-high dimensional semi-parametric model with an application in medical science. PhD thesis, University of York.

[img]
Preview
Text
PhD_Thesis_YuanKe.pdf
Available under License Creative Commons Attribution-Noncommercial-No Derivative Works 2.0 UK: England & Wales.

Download (831Kb) | Preview

Abstract

In this thesis, we consider the feature selection, model specification and estimation of the generalised semi-varying coefficient models (GSVCMs), where the number of potential covariates is allowed to diverge with the sample size. Based on the penalised likelihood approach and kernel smoothing method, we propose a penalised weighted least squares procedure to select the significant covariates, identify constant coefficients among the coefficients of the selected covariates, and estimate the functional or constant coefficients in GSVCMs. A computational algorithm is also proposed to implement the procedure. Our approach not only inherits many desirable statistical properties from the local maximum likelihood estimation and nonconcave penalised likelihood method, but also computationally attractive thanks to the proposed computational algorithm. Under some mild conditions, we establish the theoretical properties for the proposed procedure such as sparsity, oracle property and the uniform convergence rates of the proposed estimators. We also provide simulation studies to show the proposed procedure works very well when the sample size is finite. We then use the proposed procedure to analyse a real environmental data set, which leads to some interesting findings. Finally, we establish a classification method and show it can be used to improve predictive modelling for classify the patients with early inflammatory arthritis at baseline into different risk groups in future disease progression.

Item Type: Thesis (PhD)
Keywords: GSVCM, LASSO, SCAD, local maximum likelihood, penalise likelihood, model selection, oracle estimation, sparsity, ultra-high dimension, prognostic classification
Academic Units: The University of York > Mathematics (York)
Identification Number/EthosID: uk.bl.ethos.647078
Depositing User: Mr. Yuan Ke
Date Deposited: 08 May 2015 11:25
Last Modified: 08 Sep 2016 13:32
URI: http://etheses.whiterose.ac.uk/id/eprint/8842

Actions (repository staff only: login required)