White Rose University Consortium logo
University of Leeds logo University of Sheffield logo York University logo

Data Mining and Machine Learning to Predict Acute Coronary Syndrome Mortality

Jaafar, Juliana (2017) Data Mining and Machine Learning to Predict Acute Coronary Syndrome Mortality. PhD thesis, University of Leeds.

[img] Text
Jaafar_J_Leeds_Institute_of_Health_Sciences_PHD_2017.pdf - Final eThesis - complete (pdf)
Restricted until 1 May 2021.

Request a copy


This thesis has investigated and demonstrated the potential for developing prediction models using Machine Learning(ML) algorithms on registry datasets. Many current Acute Coronary Syndrome (ACS) prediction models, were developed using traditional statistical methods. In an era of big-data evolution, ML offers a spectrum of algorithms that aid in generating prediction models for ACS. This study has explored 29 algorithms with which to build ACS prediction models for Asian (Malaysia) and Western (Leeds, UK) registries, covering patients with all types of ACS and those with the new standard ACS treatments. The internal and external validation of the models present satisfactory calibration measures, indicating the ability of ML algorithms to produce competitive models in comparison to traditional statistical methods. To achieve simpler, yet competitive predictive performance, comprehensive ML feature selection methods have been evaluated, and Correlation-Based-Feature-Selection(CFS) emerged as the best method. This thesis also has evaluated the potential of predictors of existing ACS models to be adapted to other registries‘ data. Despite different regions and different population characteristics, most of the existing predictors remains constant with the outcome. Thus, the findings suggest that, with some adjustments customized to the registry, the existing predictors can be adopted to develop a simple model and expedite the model development process. Furthermore, the strength of the predictors of each clinical categories has also been evaluated. The results suggest that, to construct a satisfactory ACS model, combination of predictors from various clinical events is essential. At the very least, to achieve a satisfactory model, combination of demographic, medical history, and clinical presentation information categories is required. However, predictors from medication history category has found to be worthless in terms of contributing to a better prediction model. Next, this study has investigated classifier degradation in ML model development. The findings suggest that the overlapping instances in minority class of imbalanced dataset and missing values are the main problems of classifier degradation. New methods i.e. the overlapped-undersampling method to handle imbalanced dataset and the mean-clustering-imputation method to handle missing values have been introduced. The overlapped-undersampling failed to boost the model performance of the datasets. Nevertheless, the results suggest that more training samples on imbalanced datasets are sufficient to produce satisfactory models. The mean-clustering-imputation method produced better models compare to the simple imputation method and imputation method embedded in an algorithm. However, removing instances with missing data resulted in superior models.

Item Type: Thesis (PhD)
Keywords: Machine Learning, data mining, Acute Coronary Syndrome, prediction model
Academic Units: The University of Leeds > Faculty of Medicine and Health (Leeds) > Institute of Health Sciences (Leeds)
The University of Leeds > Faculty of Medicine and Health (Leeds) > Institute of Health Sciences (Leeds) > Yorkshire Centre for Health Informatics (Leeds)
Depositing User: Mrs Juliana Jaafar
Date Deposited: 01 May 2018 10:36
Last Modified: 01 May 2018 10:36
URI: http://etheses.whiterose.ac.uk/id/eprint/20140

Please use the 'Request a copy' link(s) above to request this thesis. This will be sent directly to someone who may authorise access.
You can contact us about this thesis. If you need to make a general enquiry, please see the Contact us page.

Actions (repository staff only: login required)