Janowiak, Jakub Piotr (2020) Solid State Informatics studies to address challenges in pharmaceutics development. Integrated PhD and Master thesis, University of Leeds.
Abstract
Cheminformatics methods such as Matched Molecular Pair Analysis (MMPA) and Quantitative Structure-Property Relationship (QSPR) models based on molecular structure have been widely used to address challenges faced during the Discovery stage of pharmaceutical product development. This thesis builds upon these concepts by including the solid state consideration to address challenges associated with the Development stage.
Polymorph propensity of molecules and solid state specific melting point (as a surrogate for solubility) were focused upon in the thesis. Matched Molecular Pair Analysis (MMPA) was used for the propensity study. However, no statistically significant molecular transformations were identified due to the small number of MMPs identified and the limited size and quality of polymorphism data.
The issue of the small number of MMPs was further analysed by constructing a Matched Molecular Graph. The graph approach allowed the comparison of the properties of datasets from different stages of the pharmaceutical development process. Datasets taken from Development stage contain fewer molecules with at least one MMP (25.1 %) and the lower total number of MMPs (2,776) compared to Discovery datasets of the same size (58.2 % and 10,321), making the analysis method less suitable.
A benchmarking dataset for crystal structure classification (into polymorphs and redeterminations) was curated, and the developed machine-learning based method (F1=0.910) along with existing methods (F1=0.780) of classification were compared.
A Message Passing Neural Network was used to develop a QSPR model using molecular and crystal information. The best model that only used molecular information achieved R2 of 0.628 on the validation set, while the model trained with the crystal information obtained 0.649. The improvements were limited when compared to the QSPR model that only utilised molecular information; likely due to the limited polymorphic data and the typically small effect the crystal packing differences causes. The best model achieved test set R2 value of 0.550.
This thesis provides partial solutions to the challenges of solid form informatics and forms a starting point for further research in the area.
Metadata
Supervisors: | Martin, Elaine and Roberts, Kevin |
---|---|
Keywords: | machine learning, data science, cheminformatics, solid state |
Awarding institution: | University of Leeds |
Academic Units: | The University of Leeds > Faculty of Engineering (Leeds) The University of Leeds > Faculty of Engineering (Leeds) > School of Chemical and Process Engineering (Leeds) The University of Leeds > Faculty of Engineering (Leeds) > School of Chemical and Process Engineering (Leeds) > Institute of Particle Science and Engineering (Leeds) |
Depositing User: | mr Jakub Janowiak |
Date Deposited: | 16 Sep 2021 13:19 |
Last Modified: | 16 Sep 2021 13:19 |
Open Archives Initiative ID (OAI ID): | oai:etheses.whiterose.ac.uk:29233 |
Download
Final eThesis - complete (pdf)
Embargoed until: 1 February 2026
Please use the button below to request a copy.
Filename: thesis-corrections.pdf
Export
Statistics
Please use the 'Request a copy' link(s) in the 'Downloads' section above to request this thesis. This will be sent directly to someone who may authorise access.
You can contact us about this thesis. If you need to make a general enquiry, please see the Contact us page.