Ghiandoni, Gian Marco (2019) Enhancing Reaction-based de novo Design using Machine Learning. PhD thesis, University of Sheffield.
Abstract
De novo design is a branch of chemoinformatics that is concerned with the rational design of molecular structures with desired properties, which specifically aims at achieving suitable pharmacological and safety profiles when applied to drug design. Scoring, construction, and search methods are the main components that are exploited by de novo design programs to explore the chemical space to encourage the cost-effective design of new chemical entities. In particular, construction methods are concerned with providing strategies for compound generation to address issues such as drug-likeness and synthetic accessibility.
Reaction-based de novo design consists of combining building blocks according to transformation rules that are extracted from collections of known reactions, intending to restrict the enumerated chemical space into a manageable number of synthetically accessible structures. The reaction vector is an example of a representation that encodes topological changes occurring in reactions, which has been integrated within a structure generation algorithm to increase the chances of generating molecules that are synthesisable.
The general aim of this study was to enhance reaction-based de novo design by developing machine learning approaches that exploit publicly available data on reactions. A series of algorithms for reaction standardisation, fingerprinting, and reaction vector database validation were introduced and applied to generate new data on which the entirety of this work relies. First, these collections were applied to the validation of a new ligand-based design tool. The tool was then used in a case study to design compounds which were eventually synthesised using very similar procedures to those suggested by the structure generator.
A reaction classification model and a novel hierarchical labelling system were then developed to introduce the possibility of applying transformations by class. The model was augmented with an algorithm for confidence estimation, and was used to classify two datasets from industry and the literature. Results from the classification suggest that the model can be used effectively to gain insights on the nature of reaction collections.
Classified reactions were further processed to build a reaction class recommendation model capable of suggesting appropriate reaction classes to apply to molecules according to their fingerprints. The model was validated, then integrated within the reaction vector-based design framework, which was assessed on its performance against the baseline algorithm. Results from the de novo design experiments indicate that the use of the recommendation model leads to a higher synthetic accessibility and a more efficient management of computational resources.
Metadata
Supervisors: | Gillet, Valerie and Chen, Beining and Bodkin, Michael |
---|---|
Awarding institution: | University of Sheffield |
Academic Units: | The University of Sheffield > Faculty of Social Sciences (Sheffield) > Information School (Sheffield) |
Identification Number/EthosID: | uk.bl.ethos.805396 |
Depositing User: | Mr Gian Marco Ghiandoni |
Date Deposited: | 07 May 2020 16:33 |
Last Modified: | 01 May 2021 09:53 |
Open Archives Initiative ID (OAI ID): | oai:etheses.whiterose.ac.uk:26718 |
Download
Enhancing Reaction-based de novo Design using Machine Learning
Filename: thesis_complete.pdf
Description: Enhancing Reaction-based de novo Design using Machine Learning
Licence:
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivs 2.5 License
Export
Statistics
You do not need to contact us to get a copy of this thesis. Please use the 'Download' link(s) above to get a copy.
You can contact us about this thesis. If you need to make a general enquiry, please see the Contact us page.