White Rose University Consortium logo
University of Leeds logo University of Sheffield logo York University logo

Additive Cox proportional hazards models for next-generation sequencing data

Alshanbari, Huda Mohammed H (2017) Additive Cox proportional hazards models for next-generation sequencing data. PhD thesis, University of Leeds.

Text (Huda Alshanbari PhD thesis)
Huda_Alshanbari_Mathematics_PhD_2017.pdf - Final eThesis - complete (pdf)
Available under License Creative Commons Attribution-Noncommercial-Share Alike 2.0 UK: England & Wales.

Download (5Mb) | Preview


Eighty-Nine Non-Small Cell Lung Cancer (NSCLC) patients experience chromosomal rearrangements called Copy Number Alteration (CNA), where the cells have abnormal number of copies in one or more regions in their genome, this genetic alteration are known to drive cancer development. An important aim of this thesis is to propose a way to combine the clinical covariate as fixed predictors with CNAs genomics windows as smoothing terms using the penalized additive Cox Proportional Hazards (PH) model. Most of the proposed prediction methods assume linearity of the CNAs genomic windows along with the clinical covariates. However, the continuous covariates can affect the hazard via more complicated nonlinear functional forms. Therefore, Cox PH model with continuous covariate are likely misspecified, because it is not fitting the correct functional form for the continuous covariates. Some reports of the work on combining the clinical covariates with high-dimensional genomic data in a clinical genomic prediction are based on standard Cox PH model. Most of them focus on applying variable selection to high-dimensional CNA genomic data. Our main interest is to propose a variable selection procedure to select important nonlinear effects from CNAs genomic-windows. Two different approaches of feature selection are presented which are discrete and shrinkage. Discrete feature selection is based on penalized univariate variable selection, which identify the subset of the CNAs genomic-windows have the strongest effects on the survival time, while feature selection by shrinkage works by adding a second penalty to the penalized partial log-likelihood, that leads to penalizing the smoothing coefficients in the model, as a result some of the smoothing coefficient are being set to the zero. For the NSCLC dataset, we find that the size of the tumor cells and spread cancer into the lymph nodes are significant factors that increase the hazard of the patients survival, and the estimate of the smooth log hazard ratio curves identify that some of the significant CNA genomic-windows contribute a higher or lower hazard of death to the survival of some significant CNA genomic-windows across the genome.

Item Type: Thesis (PhD)
Keywords: Survival Analysis, Generalized Additive models, Genomic Profiles
Academic Units: The University of Leeds > Faculty of Maths and Physical Sciences (Leeds)
The University of Leeds > Faculty of Maths and Physical Sciences (Leeds) > School of Mathematics (Leeds)
The University of Leeds > Faculty of Maths and Physical Sciences (Leeds) > School of Mathematics (Leeds) > Statistics (Leeds)
Identification Number/EthosID: uk.bl.ethos.736513
Depositing User: Dr. Huda Alshanbari
Date Deposited: 20 Mar 2018 10:10
Last Modified: 11 Apr 2020 09:53
URI: http://etheses.whiterose.ac.uk/id/eprint/19739

You do not need to contact us to get a copy of this thesis. Please use the 'Download' link(s) above to get a copy.
You can contact us about this thesis. If you need to make a general enquiry, please see the Contact us page.

Actions (repository staff only: login required)