Walter, Moritz (2022) Improving the Accuracy and Interpretability of Machine Learning Models for Toxicity Prediction. PhD thesis, University of Sheffield.
Abstract
Humans are exposed to a multitude of chemicals (e.g. pharmaceuticals and cosmetics) and the safety of these needs to be demonstrated. Quantitative structure-activity relationship (QSAR) models provide an alternative to undesired animal studies for this purpose. However, in practice their use is often limited either due to insufficient model accuracy or due to a lack of model interpretability.
This thesis addresses current limitations of QSAR models used for toxicity prediction. Firstly, it was
investigated whether multi-task and imputation modelling yield more accurate models compared to
standard single task QSAR models. Secondly, attempts were made to improve the interpretability of
neural networks used for QSAR modelling. In particular, a method was developed to extract
information about chemical features learned in the hidden layers of neural networks.
While no significant differences in performance were found between single task models and
traditional multi-task models (using only chemical descriptors for test compounds), multi-task
imputation models (using experimental data labels of related assays for test compounds) were found
to clearly outperform single task models on in vitro toxicity datasets. Imputation is therefore a
promising tool to improve the performance of QSAR models for toxicity prediction.
The novel method developed to interpret neural network models, called IG_hidden, makes use of
integrated gradients to identify neurons relevant for individual predictions. Then, substructures found
to be relevant for activation of these neurons are used to visualise which atoms of a compound are
responsible for the model predictions. IG_hidden was compared to an established method for
interpreting neural networks (i.e. applying integrated gradients to input features) using Lhasa’s Derek
alerts for mutagenicity as a ground truth. The overall performance of IG_hidden was found to be
comparable to the published method in terms of the quality of the model explanations that were
found. However, the approaches were complementary with each method performing better on
certain subsets of the dataset
Metadata
Supervisors: | Gillet, Val |
---|---|
Related URLs: | |
Keywords: | toxicity prediction, chemoinformatics, machine learning, QSAR modelling, deep learning, explainable AI |
Awarding institution: | University of Sheffield |
Academic Units: | The University of Sheffield > Faculty of Social Sciences (Sheffield) The University of Sheffield > Faculty of Social Sciences (Sheffield) > Information School (Sheffield) |
Depositing User: | Mr Moritz Walter |
Date Deposited: | 30 Jan 2023 22:45 |
Last Modified: | 30 Jan 2024 01:07 |
Open Archives Initiative ID (OAI ID): | oai:etheses.whiterose.ac.uk:32210 |
Download
Final eThesis - complete (pdf)
Filename: Walter_thesis_final_submission.pdf
Licence:
This work is licensed under a Creative Commons Attribution 4.0 International License
Export
Statistics
You do not need to contact us to get a copy of this thesis. Please use the 'Download' link(s) above to get a copy.
You can contact us about this thesis. If you need to make a general enquiry, please see the Contact us page.