Janowiak, Jakub Piotr (2020) Solid State Informatics studies to address challenges in pharmaceutics development. Integrated PhD and Master thesis, University of Leeds.
Abstract
Cheminformatics methods such as Matched Molecular Pair Analysis (MMPA) and Quantitative Structure-Property Relationship (QSPR) models based on molecular structure have been widely used to address challenges faced during the Discovery stage of pharmaceutical product development. This thesis builds upon these concepts by including the solid state consideration to address challenges associated with the Development stage.
Polymorph propensity of molecules and solid state specific melting point (as a surrogate for solubility) were focused upon in the thesis. Matched Molecular Pair Analysis (MMPA) was used for the propensity study. However, no statistically significant molecular transformations were identified due to the small number of MMPs identified and the limited size and quality of polymorphism data.
The issue of the small number of MMPs was further analysed by constructing a Matched Molecular Graph. The graph approach allowed the comparison of the properties of datasets from different stages of the pharmaceutical development process. Datasets taken from Development stage contain fewer molecules with at least one MMP (25.1 %) and the lower total number of MMPs (2,776) compared to Discovery datasets of the same size (58.2 % and 10,321), making the analysis method less suitable.
A benchmarking dataset for crystal structure classification (into polymorphs and redeterminations) was curated, and the developed machine-learning based method (F1=0.910) along with existing methods (F1=0.780) of classification were compared.
A Message Passing Neural Network was used to develop a QSPR model using molecular and crystal information. The best model that only used molecular information achieved R2 of 0.628 on the validation set, while the model trained with the crystal information obtained 0.649. The improvements were limited when compared to the QSPR model that only utilised molecular information; likely due to the limited polymorphic data and the typically small effect the crystal packing differences causes. The best model achieved test set R2 value of 0.550.
This thesis provides partial solutions to the challenges of solid form informatics and forms a starting point for further research in the area.
Metadata
| Supervisors: | Martin, Elaine and Roberts, Kevin |
|---|---|
| Keywords: | machine learning, data science, cheminformatics, solid state |
| Awarding institution: | University of Leeds |
| Academic Units: | The University of Leeds > Faculty of Engineering (Leeds) The University of Leeds > Faculty of Engineering (Leeds) > School of Chemical and Process Engineering (Leeds) The University of Leeds > Faculty of Engineering (Leeds) > School of Chemical and Process Engineering (Leeds) > Institute of Particle Science and Engineering (Leeds) |
| Date Deposited: | 16 Sep 2021 13:19 |
| Last Modified: | 01 Feb 2026 01:05 |
| Open Archives Initiative ID (OAI ID): | oai:etheses.whiterose.ac.uk:29233 |
Download
Final eThesis - complete (pdf)
Filename: thesis-corrections.pdf
Licence:

This work is licensed under a Creative Commons Attribution NonCommercial ShareAlike 4.0 International License
Export
Statistics
You do not need to contact us to get a copy of this thesis. Please use the 'Download' link(s) above to get a copy.
You can contact us about this thesis. If you need to make a general enquiry, please see the Contact us page.