Alshanbari, Huda Mohammed H (2017) Additive Cox proportional hazards models for next-generation sequencing data. PhD thesis, University of Leeds.
Abstract
Eighty-Nine Non-Small Cell Lung Cancer (NSCLC) patients experience chromosomal
rearrangements called Copy Number Alteration (CNA), where the cells have
abnormal number of copies in one or more regions in their genome, this genetic alteration
are known to drive cancer development. An important aim of this thesis is
to propose a way to combine the clinical covariate as fixed predictors with CNAs genomics
windows as smoothing terms using the penalized additive Cox Proportional Hazards (PH) model. Most of the proposed prediction methods assume linearity of the CNAs genomic windows along with the clinical covariates. However, the continuous covariates can affect the hazard via more complicated nonlinear functional forms. Therefore, Cox PH model with continuous covariate are likely misspecified, because it is not fitting the correct functional form for the continuous covariates. Some reports of the work on combining the clinical covariates with high-dimensional genomic data in a clinical genomic prediction are based on standard Cox PH model. Most of them focus on applying variable selection to high-dimensional CNA genomic data.
Our main interest is to propose a variable selection procedure to select important nonlinear effects from CNAs genomic-windows. Two different approaches of feature
selection are presented which are discrete and shrinkage. Discrete feature selection is based on penalized univariate variable selection, which identify the subset of the
CNAs genomic-windows have the strongest effects on the survival time, while feature selection by shrinkage works by adding a second penalty to the penalized partial log-likelihood, that leads to penalizing the smoothing coefficients in the model, as a result some of the smoothing coefficient are being set to the zero.
For the NSCLC dataset, we find that the size of the tumor cells and spread cancer into the lymph nodes are significant factors that increase the hazard of the patients
survival, and the estimate of the smooth log hazard ratio curves identify that some of
the significant CNA genomic-windows contribute a higher or lower hazard of death to the survival of some significant CNA genomic-windows across the genome.
Metadata
Supervisors: | Barber, Stuart and Gusnanto, Arief |
---|---|
Keywords: | Survival Analysis, Generalized Additive models, Genomic Profiles |
Awarding institution: | University of Leeds |
Academic Units: | The University of Leeds > Faculty of Maths and Physical Sciences (Leeds) The University of Leeds > Faculty of Maths and Physical Sciences (Leeds) > School of Mathematics (Leeds) The University of Leeds > Faculty of Maths and Physical Sciences (Leeds) > School of Mathematics (Leeds) > Statistics (Leeds) |
Identification Number/EthosID: | uk.bl.ethos.736513 |
Depositing User: | Dr. Huda Alshanbari |
Date Deposited: | 20 Mar 2018 10:10 |
Last Modified: | 11 Apr 2020 09:53 |
Open Archives Initiative ID (OAI ID): | oai:etheses.whiterose.ac.uk:19739 |
Download
Final eThesis - complete (pdf)
Filename: Huda_Alshanbari_Mathematics_PhD_2017.pdf
Description: Huda Alshanbari PhD thesis
Licence:
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 2.5 License
Export
Statistics
You do not need to contact us to get a copy of this thesis. Please use the 'Download' link(s) above to get a copy.
You can contact us about this thesis. If you need to make a general enquiry, please see the Contact us page.