Kariotis, Sokratis ORCID: https://orcid.org/0000-0001-9993-6017 (2022) Unsupervised machine learning of high dimensional data for patient stratification. PhD thesis, University of Sheffield.
Abstract
The development mechanisms of numerous complex, rare diseases are largely unknown to scientists partly due to their multifaceted heterogeneity. Stratifying
patients is becoming a very important objective as we further research that inherent heterogeneity which can be utilised towards personalised medicine. However,
considerable difficulties slow down accurate patient stratification mainly represented by outdated clinical criteria, weak associations or simple symptom categories.
Fortunately, immense steps have been taken towards multiple omic data generation and utilisation aiming to produce new insights as in exploratory machine learning
which showed the potential to identify the source of disease mechanisms from patient subgroups. This work describes the development of a modular clustering
toolkit, named Omada, designed to assist researchers in exploring disease heterogeneity without extensive expertise in the machine learning field. Subsequently,
it assesses Omada’s capabilities and validity by testing the toolkit on multiple data modalities from pulmonary hypertension (PH) patients. I first demonstrate the
toolkit’s ability to create biologically meaningful subgroups based on whole blood RNA-seq data from H/IPAH patients in the manuscript “Biological heterogeneity in
idiopathic pulmonary arterial hypertension identified through unsupervised transcriptomic profiling of whole blood”. Our work on the manuscript titled “Diagnostic
miRNA signatures for treatable forms of pulmonary hypertension highlight challenges with clinical classification” aimed to apply the same clustering approach on a PH
microRNA dataset as a first step in forming microRNA diagnostic signatures by recognising the potential of microRNA expression in identifying diverse disease
sub-populations irrespectively of pre-existing PH classes. The toolkit’s effectiveness on metabolite data was also tested. Lastly, a longitudinal clustering approach was
explored on activity readouts from wearables on COVID-19 patients as part of our manuscript “Unsupervised machine learning identifies and associates trajectory
patterns of COVID-19 symptoms and physical activity measured via a smart watch”. Two clusters of high and low activity trajectories were generated and associated with
symptom classes showing a weak but interesting relationship between the two. In summary, this thesis is examining the potential of patient stratification based on
several data types from patients that represent a new, unseen picture of disease mechanisms. The tools presented provide important indications of distinct patient
groups and could generate the insights needed for further targeted research and clinical associations that can help towards understanding rare, complex diseases.
Metadata
Supervisors: | Wang, Dennis and Allan, Lawrie and Haiping, Lu |
---|---|
Keywords: | Machine learning, RNA, RNA-seq, clustering |
Awarding institution: | University of Sheffield |
Academic Units: | The University of Sheffield > Faculty of Medicine, Dentistry and Health (Sheffield) > Medicine (Sheffield) |
Identification Number/EthosID: | uk.bl.ethos.883436 |
Depositing User: | Mr Sokratis Kariotis |
Date Deposited: | 23 May 2023 15:13 |
Last Modified: | 01 Jul 2023 09:53 |
Open Archives Initiative ID (OAI ID): | oai:etheses.whiterose.ac.uk:32783 |
Download
Final eThesis - complete (pdf)
Filename: Final Revised Thesis - Kariotis Sokratis, 180125939.pdf
Licence:
This work is licensed under a Creative Commons Attribution NonCommercial NoDerivatives 4.0 International License
Export
Statistics
You do not need to contact us to get a copy of this thesis. Please use the 'Download' link(s) above to get a copy.
You can contact us about this thesis. If you need to make a general enquiry, please see the Contact us page.