White Rose University Consortium logo
University of Leeds logo University of Sheffield logo York University logo

Developing statistical and bioinformatic analysis of genomic data from tumours

Thakur, Rohit (2018) Developing statistical and bioinformatic analysis of genomic data from tumours. PhD thesis, University of Leeds.

Thakur_R_Medicine_PhD_2018.pdf - Final eThesis - complete (pdf)
Available under License Creative Commons Attribution Noncommercial 2.0 UK: England & Wales.

Download (6Mb) | Preview


Previous prognostic signatures for melanoma based on tumour transcriptomic data were developed predominantly on cohorts of AJCC (American Joint Committee on Cancer) stages III and IV melanoma. Since 92% of melanoma patients are diagnosed at AJCC stages I and II, there is an urgent need for better prognostic biomarkers to allow patient stratification for receiving early adjuvant therapies. This study uses genome-wide tumour gene expression levels and clinico-histopathological characteristics of patients from the Leeds Melanoma Cohort (LMC). Several unsupervised and supervised classification approaches were applied to the transcriptomic data, to identify biological classes of melanoma, and to develop prognostic classification models respectively. Unsupervised clustering identified six biologically distinct primary melanoma classes (LMC classes). Unlike previous molecular classes of melanoma, the LMC classes were prognostic in both the whole LMC dataset and in stage I tumours. The prognostic value of the LMC classes was replicated in an independent dataset, but insufficient data were available to replicate in an AJCC stage I subset. Supervised classification using the Random Forest (RF) approach provided improved performances when adjustments were made to deal with class imbalance, while this did not improve performance of the Support Vector Machine (SVM). However, RF and SVM had similar results overall, with RF only marginally better. Combining clinical and transcriptomic information in the RF further improved the performance of the prediction model in comparison to using clinical information alone. Finally, the agnostically derived LMC classes and the supervised RF model showed convergence in their association with outcome in some groups of patients, but not in others. In conclusion, this study reports six molecular classes of primary melanoma with prognostic value in stage I disease and overall, and a prognostic classification model that predicts outcome in primary melanoma.

Item Type: Thesis (PhD)
Keywords: primary melanoma, prognostic signatures, Stage I prognostic signature, machine learning, transcriptomic data analysis, clustering, PAM, Random Forest, Support Vector Machine
Academic Units: The University of Leeds > Faculty of Medicine and Health (Leeds) > Institute of Molecular Medicine (LIMM) (Leeds) > Section of Epidemiology and Biostatistics (Leeds)
Identification Number/EthosID: uk.bl.ethos.772831
Depositing User: Rohit Thakur
Date Deposited: 15 Apr 2019 09:15
Last Modified: 11 Mar 2020 10:53
URI: http://etheses.whiterose.ac.uk/id/eprint/22674

You do not need to contact us to get a copy of this thesis. Please use the 'Download' link(s) above to get a copy.
You can contact us about this thesis. If you need to make a general enquiry, please see the Contact us page.

Actions (repository staff only: login required)