Godwin, James Thomas
ORCID: 0000-0002-1346-1733
(2025)
Development and validation of lymphoma classifiers using gene expression profiling data from a population-based cohort.
PhD thesis, University of York.
Abstract
Introduction: Lymphomas are a heterogeneous group of cancers with distinct features,
treatments and outcomes. Hence, the most appropriate diagnosis is essential for patient
care; however, the current processes give rise to subjectivity, intra-observer variability, and
increasing complexity. Molecular classifiers, often derived from machine learning, have
been explored as potential tools to complement existing practice for triaging routine cases
or assisting with more challenging ones. However, these usually lacked the diagnostic
breadth required for potential clinical use, were rarely externally validated, and had
unclear generalisability.
Methods: Gene expression profiling (GEP) data primarily from the Haematological
Malignancy Research Network (HMRN), a UK population-based cohort, was used to
develop lymphoma classifiers. A novel machine learning-based pan-lymphoma classifier
was trained on 1,493 samples across 23 lymphoma subtypes, including rare and atypical.
284 cases from HMRN were retrospectively collected in a rolling validation set. In addition,
binary subtype-specific classifiers were developed to classify more granular subtypes using
published gene expression signatures. 855 microarray cases were used to externally
validate the pan-lymphoma classifier, which required the creation of a novel cross-platform
normalisation method.
Results: The pan-lymphoma classifier demonstrated strong concordance with expert
pathologist classification, with it generalising across GEP platforms, pathologists and
laboratories. Several published gene expression signatures were validated for the first time
in population-based data and showed potential for use in more granular classification. The
new normalisation method allowed for the integration of data across distinct GEP
technologies and complex experimental designs.
Conclusions: This is the first pan-lymphoma classifier developed using population-based
data, spanning multiple challenging diagnostic areas. The breadth and depth of information
on lymphoma classification that these classifiers offer, derived from a single assay, is not
currently available. Together, the classifiers and normalisation method developed
represent significant progress towards the potential for routine use of GEP in lymphoma
classification.
Metadata
| Supervisors: | Smith, Alexandra and Crouch, Simon and Care, Matthew |
|---|---|
| Publicly visible additional information: | This work uses data provided by patients and collected by the NHS as part of their care and support |
| Keywords: | lymphoma, gene expression profiling, bioinformatics, machine learning, data science, haematopathology |
| Awarding institution: | University of York |
| Academic Units: | The University of York > Health Sciences (York) |
| Date Deposited: | 20 Apr 2026 10:14 |
| Last Modified: | 20 Apr 2026 10:14 |
| Open Archives Initiative ID (OAI ID): | oai:etheses.whiterose.ac.uk:38581 |
Download
Examined Thesis (PDF)
Embargoed until: 20 April 2027
Please use the button below to request a copy.
Export
Statistics
Please use the 'Request a copy' link(s) in the 'Downloads' section above to request this thesis. This will be sent directly to someone who may authorise access.
You can contact us about this thesis. If you need to make a general enquiry, please see the Contact us page.