ALQULAITY, MALAK YAHYA M (2025) Advanced Machine Learning Approaches for Comprehensive Cardiovascular Disease Risk Prediction Using Synthetic Data and Dynamic Feature Selection. PhD thesis, University of Sheffield.
Abstract
Cardiovascular diseases (CVD) are a leading cause of global mortality, highlighting the need for accurate and reliable risk prediction models. Traditional CVD risk assessment tools, such as Framingham, SCORE, and QRISK, have several limitations that affect their accuracy and applicability. These tools typically focus on a narrow set of major risk factors, potentially overlooking important non-traditional factors, resulting in a less comprehensive risk assessment. Additionally, they often rely on linear models, which may fail to capture complex, non-linear interactions within the data. This thesis addresses the limitations of traditional CVD risk assessment tools by developing a hybrid predictive framework that integrates advanced machine learning (ML) techniques to enhance the accuracy of Coronary Artery Calcium (CAC) score prediction and CVD risk assessment using both traditional and non-traditional risk factors. The research is structured around three key objectives: generating synthetic data, enhancing feature selection, and developing a hybrid approach. To address data limitations, a Tabular Generative Adversarial Network (GAN) was enhanced to generate high-quality synthetic data, effectively expanding the training dataset and improving model robustness. Feature selection was further refined through an adaptive SHAP-based
method, which dynamically adjusts feature importance thresholds to capture both traditional and non-traditional CVD risk factors more accurately. Finally, a hybrid approach combining hyperparameter tuning algorithms (Genetic Algorithms, Particle Swarm Optimisation, and Bayesian Optimisation) with Gradient Boosting algorithms (XGBoost, LightGBM, and CatBoost) was implemented to maximise predictive accuracy. This two-stage model first predicts CAC scores and then uses these predictions, alongside additional risk factors, to assess the likelihood of CVD events. Results demonstrate that the hybrid approach consistently enhances prediction accuracy across multiple metrics, with the CatBoost model particularly outperforming in both CAC score prediction and CVD classification.
Metadata
| Supervisors: | Yang, Po |
|---|---|
| Awarding institution: | University of Sheffield |
| Academic Units: | The University of Sheffield > Faculty of Engineering (Sheffield) > Computer Science (Sheffield) The University of Sheffield > Faculty of Engineering (Sheffield) |
| Date Deposited: | 15 Dec 2025 09:50 |
| Last Modified: | 15 Dec 2025 09:50 |
| Open Archives Initiative ID (OAI ID): | oai:etheses.whiterose.ac.uk:34730 |
Download
Final eThesis - complete (pdf)
Embargoed until: 15 December 2026
Please use the button below to request a copy.
Filename: Advanced_Machine_Learning_Approaches_for_Cardiovascular_Risk_Prediction.pdf
Export
Statistics
Please use the 'Request a copy' link(s) in the 'Downloads' section above to request this thesis. This will be sent directly to someone who may authorise access.
You can contact us about this thesis. If you need to make a general enquiry, please see the Contact us page.