Jaafar, Juliana (2017) Data Mining and Machine Learning to Predict Acute Coronary Syndrome Mortality. PhD thesis, University of Leeds.
Abstract
This thesis has investigated and demonstrated the potential for developing prediction models using Machine Learning(ML) algorithms on registry datasets. Many current Acute Coronary Syndrome (ACS) prediction models, were developed using traditional statistical methods. In an era of big-data evolution, ML offers a spectrum of algorithms that aid in generating prediction models for ACS. This study has explored 29 algorithms with which to build ACS prediction models for Asian (Malaysia) and Western (Leeds, UK) registries, covering patients with all types of ACS and those with the new standard ACS treatments. The internal and external validation of the models present satisfactory calibration measures, indicating the ability of ML algorithms to produce competitive models in comparison to traditional statistical methods.
To achieve simpler, yet competitive predictive performance, comprehensive ML feature selection methods have been evaluated, and Correlation-Based-Feature-Selection(CFS) emerged as the best method. This thesis also has evaluated the potential of predictors of existing ACS models to be adapted to other registries‘ data. Despite different regions and different population characteristics, most of the existing predictors remains constant with the outcome. Thus, the findings suggest that, with some adjustments customized to the registry, the existing predictors can be adopted to develop a simple model and expedite the model development process. Furthermore, the strength of the predictors of each clinical categories has also been evaluated. The results suggest that, to construct a satisfactory ACS model, combination of predictors from various clinical events is essential. At the very least, to achieve a satisfactory model, combination of demographic, medical history, and clinical presentation information categories is required. However, predictors from medication history category has found to be worthless in terms of contributing to a better prediction model.
Next, this study has investigated classifier degradation in ML model development. The findings suggest that the overlapping instances in minority class of imbalanced dataset and missing values are the main problems of classifier degradation. New methods i.e. the overlapped-undersampling method to handle imbalanced dataset and the mean-clustering-imputation method to handle missing values have been introduced. The overlapped-undersampling failed to boost the model performance of the datasets. Nevertheless, the results suggest that more training samples on imbalanced datasets are sufficient to produce satisfactory models. The mean-clustering-imputation method produced better models compare to the simple imputation method and imputation method embedded in an algorithm. However, removing instances with missing data resulted in superior models.
Metadata
Supervisors: | Atwell, Eric and Wyatt, Jeremy C and Clamp, Susan |
---|---|
Keywords: | Machine Learning, data mining, Acute Coronary Syndrome, prediction model |
Awarding institution: | University of Leeds |
Academic Units: | The University of Leeds > Faculty of Medicine and Health (Leeds) > School of Medicine (Leeds) > Leeds Institute of Health Sciences The University of Leeds > Faculty of Medicine and Health (Leeds) > School of Medicine (Leeds) > Leeds Institute of Health Sciences > Yorkshire Centre for Health Informatics (Leeds) |
Identification Number/EthosID: | uk.bl.ethos.739801 |
Depositing User: | Mrs Juliana Jaafar |
Date Deposited: | 01 May 2018 10:36 |
Last Modified: | 11 Jun 2021 09:53 |
Open Archives Initiative ID (OAI ID): | oai:etheses.whiterose.ac.uk:20140 |
Download
Final eThesis - complete (pdf)
Filename: Jaafar_J_Leeds_Institute_of_Health_Sciences_PHD_2017.pdf
Licence:
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 2.5 License
Export
Statistics
You do not need to contact us to get a copy of this thesis. Please use the 'Download' link(s) above to get a copy.
You can contact us about this thesis. If you need to make a general enquiry, please see the Contact us page.