Boobier, Samuel ORCID: https://orcid.org/0000-0002-3166-2782 (2021) Solubility prediction in water and organic solvents through a combination of chemometrics and computational chemistry. PhD thesis, University of Leeds.
Abstract
Accurate solubility prediction is crucial across a range of scientific disciplines including drug
discovery, protein engineering, drug and agrochemical process design, biochemistry, route
prediction, crystallisation, and extraction. We herein report a successful approach to predicting
solubility, not only in water but also in organic solvents (ethanol, benzene, and acetone), using a
combination of machine learning and computational chemistry. Our new approach, named Causal
Structure Property Relationship (CSPR), allowed examination of the physical chemistry behind
dissolution to choose a small number of chemically relevant descriptors to produce highly
interpretable models. These models gave significantly more accurate predictions than leading
open-source and commercial solubility prediction tools, achieving accuracy (60-80 %) close to
the expected level of noise in the training data (LogS±0.7). By reproducing the physicochemical
relationship between solubility and molecular properties in different solvents, rational
improvements to the models were explored. Subsequent improvements to the models included
modifying the solvation energy and combining machine learning methods to provide a consensus
prediction. A larger dataset in water provided the basis for the discussion of pKa and speciation
in water. We conclude that gathering accurate solubility data across a range of solvents is crucial
to expanding this work and promoting sustainable chemistry in the future. It is our hope that this
methodology will be applied to other problems in chemistry and that our open-access datasets
(the first of its kind for benzene and acetone) will stimulate further research in this field.
Metadata
Supervisors: | Nguyen, Bao and Blacker, John and Hose, David |
---|---|
Keywords: | machine learning, solubility, drug discovery, drug development, computational chemistry, DFT |
Awarding institution: | University of Leeds |
Academic Units: | The University of Leeds > Faculty of Maths and Physical Sciences (Leeds) > School of Chemistry (Leeds) |
Depositing User: | Mr Samuel Boobier |
Date Deposited: | 13 Sep 2021 13:23 |
Last Modified: | 13 Sep 2021 13:23 |
Open Archives Initiative ID (OAI ID): | oai:etheses.whiterose.ac.uk:29396 |
Download
Final eThesis - complete (pdf)
Embargoed until: 1 September 2026
Please use the button below to request a copy.
Filename: Solubility_Prediction_Thesis_SB.pdf
Export
Statistics
Please use the 'Request a copy' link(s) in the 'Downloads' section above to request this thesis. This will be sent directly to someone who may authorise access.
You can contact us about this thesis. If you need to make a general enquiry, please see the Contact us page.