Wang, Huanting
ORCID: https://orcid.org/0000-0003-0579-4295
(2025)
Towards Practical Machine Learning for Program Analysis and Optimisation.
PhD thesis, University of Leeds.
Abstract
Machine learning (ML), such as supervised learning and deep reinforcement learning (DRL) techniques, has shown great potential in code-related tasks such as predicting code optimisation options and detecting software bugs. However, its practical use still faces multiple hurdles during the model design and deployment phases. This thesis addresses some of these challenges through three core contributions, aiming at making ML more practical and reliable for program analysis and optimisation.
The first contribution tackles a fundamental challenge in applying ML to code - how to represent programs. Our approach enables ML to combine static code information with dynamic symbolic execution traces to capture rich program semantics while using a learning-based approach to reduce the overhead of symbolic execution. This improved representation enabled the development of an effective ML-based bug detection tool that uncovered 55 unique code vulnerabilities from 20 real-world projects, leading to the assignment of 37 new Common Vulnerabilities and Exposures (CVEs).
The second contribution aims to lower the barrier to integrating ML into compiler development. We introduce a framework with a simple Application Programming Interface (API) that helps developers construct DRL systems for compiler optimisation. The framework adopts a meta-learning strategy combining DRL with multi-task learning to search ML architectures. We show that the ML solutions automatically assembled by our framework outperform those developed manually by independent experts across four code optimisation tasks.
The third contribution addresses the reliability issues of using trained ML models during deployment in the end-user environment. Our approach leverages statistical assessments to identify when an ML model will likely make incorrect predictions, enabling fallback strategies to maintain its robustness. We integrate our approach with 13 representative ML models across five code analysis and optimisation tasks, showing that our techniques can correctly identify an average of 97% of mispredictions.
The work presented in this thesis has resulted in three open-source tools, which we hope will support wider adoption of ML in software engineering tasks like code optimisation and bug detection.
Metadata
| Supervisors: | Wang, Zheng and Xu, Jie |
|---|---|
| Related URLs: |
|
| Keywords: | Machine Learning, Deep Learning, Reinforcement Learning, Program Representation, Program Analysis, Code Optimization, Software Security, Vulnerability Detection, Robustness, Compiler Optimization |
| Awarding institution: | University of Leeds |
| Academic Units: | The University of Leeds > Faculty of Engineering (Leeds) > School of Computing (Leeds) |
| Date Deposited: | 13 Jan 2026 16:28 |
| Last Modified: | 13 Jan 2026 16:28 |
| Open Archives Initiative ID (OAI ID): | oai:etheses.whiterose.ac.uk:37619 |
Download
Final eThesis - complete (pdf)
Filename: PhD_Thesis___Huanting (15).pdf
Licence:

This work is licensed under a Creative Commons Attribution NonCommercial ShareAlike 4.0 International License
Export
Statistics
You do not need to contact us to get a copy of this thesis. Please use the 'Download' link(s) above to get a copy.
You can contact us about this thesis. If you need to make a general enquiry, please see the Contact us page.