Alqahtani, Khaled Mubarek A. (2016) Survival analysis based on genomic profiles. PhD thesis, University of Leeds.
Abstract
Accurate survival prediction is critical in the management of cancer patients’
care and well-being. Previous studies have shown that copy number
alterations (CNA) in some key genes are individually associated with
disease phenotypes and patients’ prognosis. However, in many complex
diseases like cancer, it is expected that a large number of genes with such
an association span the genome. Furthermore, genome-wide CNA profiles
are person-specific. Each patient has their own profile and any differences
in the profile between patients may help to explain the differences
in the patients’ survival. Hence, extracting the relevant information in
the genome-wide CNA profile is critical in the prediction of cancer patients’
survival. It is currently a modelling challenge to incorporate the
genome-wide CNA profiles, in addition to the patients’ clinical information,
to predict cancer patients survival. Therefore, the focus of this thesis
is to establish or develop statistical methods that are able to include CNA
(ultra-high dimensional data) in survival Analysis. In order to address this
objective, we go throw two main parts.
The first part of the thesis concentrates on CNA estimation. CNA can be
estimated using the ratio of a tumour sample to a normal sample. Therefore,
we investigate the approximations of the distribution of the ratio of
two Poisson random variables.
In the second part of the thesis, we extend the Cox proportional hazard
(PH) model for prediction of patients survival probability by incorporating
the genome-wide CNA profiles as random predictors. The patients clinical
information remains as fixed predictors in the model. In this part three
types of distribution of random effect are investigated.
First, the random effects are assumed to be normally distributed with mean
zero and diagonal structure covariance matrix which has equal variances
and covariances of zero. The diagonal structure of covariance matrix is the
simplest possible structure for a variance-covariance matrix. This structure
indicates independence between neighbouring genomic windows. However,
CNAs have dependencies between neighbouring genomic windows,
and spatial characteristics which are ignored with such a covariance structure.
We address the spatial dependence structure of CNAs. In order to achieve
this, we start first by discussing other structures of variance-covariance
matrices of random effects ( Compound symmetry covariance matrix , and
Inverse of covariance matrix). Then, we impose smoothness using first and
second differences of random effects. Specifically, the random effects are
assumed to be correlated random effects that follow a mixture of two distributions,
normal and Cauchy, for the first or second differences (SCox).
Our approach in these two scenario was a genome-wide approach, in the
sense that we took into account all of the CNA information in the genome.
In this regard, the model does not include a variable selection mechanism.
Third, as the previous methods employ all predictors regardless of their
relevance, which make it difficult to interpret the results, we introduce a
novel algorithm based on Sparse-smoothed Cox model (SSCox) within a
random effects model-frame work to model the survival time using the patients’
clinical characteristics as fixed effects and CNA profiles as random
effects. We assumed CNA coefficients to be correlated random effects
that follow a mixture of three distributions: normal (to achieve shrinkage
around the mean values), Cauchy for the second-order differences (to gain
smoothness), and Laplace (to achieve sparsity).
We illustrate each method with a real dataset from a lung cancer cohort
as well as simulated data. For the simulation studies, we find that our
SSCox method generally preformed better than the sparse partial leastsquare
methods in prediction performance. Our estimator had smaller
mean square error, and mean absolute error than its main competitors. For the real data set, we find that the SSCox model is suitable and has enabled
a survival probability prediction based on the patients clinical information
and CNA profiles. The results indicate that cancer T- and N-staging
are significant factors in affecting the patients survival, and the estimates
of random effects allow us to examine the contribution to the survival of
some genomic regions across the genome.
Metadata
Supervisors: | Gusnanto, Arief and Taylor, Charles |
---|---|
Keywords: | Survival Analysis, Genomic Profiles, Cancer, biostatistics, bioinformatics |
Awarding institution: | University of Leeds |
Academic Units: | The University of Leeds > Faculty of Maths and Physical Sciences (Leeds) The University of Leeds > Faculty of Maths and Physical Sciences (Leeds) > School of Mathematics (Leeds) > Statistics (Leeds) |
Identification Number/EthosID: | uk.bl.ethos.705994 |
Depositing User: | Dr. K Alqahtani |
Date Deposited: | 13 Mar 2017 14:19 |
Last Modified: | 11 Apr 2022 09:53 |
Open Archives Initiative ID (OAI ID): | oai:etheses.whiterose.ac.uk:16471 |
Download
Final eThesis - complete (pdf)
Filename: Survaival_Khaled.pdf
Description: Pdf of my thesis
Licence:
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 2.5 License
Export
Statistics
You do not need to contact us to get a copy of this thesis. Please use the 'Download' link(s) above to get a copy.
You can contact us about this thesis. If you need to make a general enquiry, please see the Contact us page.