Alqahtani, Khaled Mubarek A.
(2016)
*Survival analysis based on genomic profiles.*
PhD thesis, University of Leeds.

Text (Pdf of my thesis)
Survaival_Khaled.pdf - Final eThesis - complete (pdf) Restricted until 1 April 2022. Request a copy |

## Abstract

Accurate survival prediction is critical in the management of cancer patients’ care and well-being. Previous studies have shown that copy number alterations (CNA) in some key genes are individually associated with disease phenotypes and patients’ prognosis. However, in many complex diseases like cancer, it is expected that a large number of genes with such an association span the genome. Furthermore, genome-wide CNA profiles are person-specific. Each patient has their own profile and any differences in the profile between patients may help to explain the differences in the patients’ survival. Hence, extracting the relevant information in the genome-wide CNA profile is critical in the prediction of cancer patients’ survival. It is currently a modelling challenge to incorporate the genome-wide CNA profiles, in addition to the patients’ clinical information, to predict cancer patients survival. Therefore, the focus of this thesis is to establish or develop statistical methods that are able to include CNA (ultra-high dimensional data) in survival Analysis. In order to address this objective, we go throw two main parts. The first part of the thesis concentrates on CNA estimation. CNA can be estimated using the ratio of a tumour sample to a normal sample. Therefore, we investigate the approximations of the distribution of the ratio of two Poisson random variables. In the second part of the thesis, we extend the Cox proportional hazard (PH) model for prediction of patients survival probability by incorporating the genome-wide CNA profiles as random predictors. The patients clinical information remains as fixed predictors in the model. In this part three types of distribution of random effect are investigated. First, the random effects are assumed to be normally distributed with mean zero and diagonal structure covariance matrix which has equal variances and covariances of zero. The diagonal structure of covariance matrix is the simplest possible structure for a variance-covariance matrix. This structure indicates independence between neighbouring genomic windows. However, CNAs have dependencies between neighbouring genomic windows, and spatial characteristics which are ignored with such a covariance structure. We address the spatial dependence structure of CNAs. In order to achieve this, we start first by discussing other structures of variance-covariance matrices of random effects ( Compound symmetry covariance matrix , and Inverse of covariance matrix). Then, we impose smoothness using first and second differences of random effects. Specifically, the random effects are assumed to be correlated random effects that follow a mixture of two distributions, normal and Cauchy, for the first or second differences (SCox). Our approach in these two scenario was a genome-wide approach, in the sense that we took into account all of the CNA information in the genome. In this regard, the model does not include a variable selection mechanism. Third, as the previous methods employ all predictors regardless of their relevance, which make it difficult to interpret the results, we introduce a novel algorithm based on Sparse-smoothed Cox model (SSCox) within a random effects model-frame work to model the survival time using the patients’ clinical characteristics as fixed effects and CNA profiles as random effects. We assumed CNA coefficients to be correlated random effects that follow a mixture of three distributions: normal (to achieve shrinkage around the mean values), Cauchy for the second-order differences (to gain smoothness), and Laplace (to achieve sparsity). We illustrate each method with a real dataset from a lung cancer cohort as well as simulated data. For the simulation studies, we find that our SSCox method generally preformed better than the sparse partial leastsquare methods in prediction performance. Our estimator had smaller mean square error, and mean absolute error than its main competitors. For the real data set, we find that the SSCox model is suitable and has enabled a survival probability prediction based on the patients clinical information and CNA profiles. The results indicate that cancer T- and N-staging are significant factors in affecting the patients survival, and the estimates of random effects allow us to examine the contribution to the survival of some genomic regions across the genome.

Item Type: | Thesis (PhD) |
---|---|

Keywords: | Survival Analysis, Genomic Profiles, Cancer, biostatistics, bioinformatics |

Academic Units: | The University of Leeds > Faculty of Maths and Physical Sciences (Leeds) The University of Leeds > Faculty of Maths and Physical Sciences (Leeds) > School of Mathematics (Leeds) > Statistics (Leeds) |

Depositing User: | Dr. K Alqahtani |

Date Deposited: | 13 Mar 2017 14:19 |

Last Modified: | 10 Apr 2017 09:08 |

URI: | http://etheses.whiterose.ac.uk/id/eprint/16471 |