Towards Practical Machine Learning for Program Analysis and Optimisation

Abstract

Machine learning (ML), such as supervised learning and deep reinforcement learning (DRL) techniques, has shown great potential in code-related tasks such as predicting code optimisation options and detecting software bugs. However, its practical use still faces multiple hurdles during the model design and deployment phases. This thesis addresses some of these challenges through three core contributions, aiming at making ML more practical and reliable for program analysis and optimisation.

The first contribution tackles a fundamental challenge in applying ML to code - how to represent programs. Our approach enables ML to combine static code information with dynamic symbolic execution traces to capture rich program semantics while using a learning-based approach to reduce the overhead of symbolic execution. This improved representation enabled the development of an effective ML-based bug detection tool that uncovered 55 unique code vulnerabilities from 20 real-world projects, leading to the assignment of 37 new Common Vulnerabilities and Exposures (CVEs).

The second contribution aims to lower the barrier to integrating ML into compiler development. We introduce a framework with a simple Application Programming Interface (API) that helps developers construct DRL systems for compiler optimisation. The framework adopts a meta-learning strategy combining DRL with multi-task learning to search ML architectures. We show that the ML solutions automatically assembled by our framework outperform those developed manually by independent experts across four code optimisation tasks.

The third contribution addresses the reliability issues of using trained ML models during deployment in the end-user environment. Our approach leverages statistical assessments to identify when an ML model will likely make incorrect predictions, enabling fallback strategies to maintain its robustness. We integrate our approach with 13 representative ML models across five code analysis and optimisation tasks, showing that our techniques can correctly identify an average of 97% of mispredictions.

The work presented in this thesis has resulted in three open-source tools, which we hope will support wider adoption of ML in software engineering tasks like code optimisation and bug detection.

Metadata

Supervisors:	Wang, Zheng and Xu, Jie
Related URLs:	Enhancing Deployment-Time Predictive Model Robustness for Code Analysis and Optimization (Related publication) Combining Structured Static Code Information and Dynamic Symbolic Traces for Software Vulnerability Prediction (Related publication) Automating Reinforcement Learning Architecture Design for Code Optimization (Related publication)
Keywords:	Machine Learning, Deep Learning, Reinforcement Learning, Program Representation, Program Analysis, Code Optimization, Software Security, Vulnerability Detection, Robustness, Compiler Optimization
Awarding institution:	University of Leeds
Academic Units:	The University of Leeds > Faculty of Engineering (Leeds) > School of Computing (Leeds)
Date Deposited:	13 Jan 2026 16:28
Last Modified:	13 Jan 2026 16:28
Open Archives Initiative ID (OAI ID):	oai:etheses.whiterose.ac.uk:37619

You do not need to contact us to get a copy of this thesis. Please use the 'Download' link(s) above to get a copy.
You can contact us about this thesis. If you need to make a general enquiry, please see the Contact us page.

Towards Practical Machine Learning for Program Analysis and Optimisation

Abstract

Metadata

Download

Final eThesis - complete (pdf)

Export

Statistics