White Rose University Consortium logo
University of Leeds logo University of Sheffield logo York University logo

Similarity in the context of the Orphan Drug Legislation

Pereira Franco, Joao Pedro (2015) Similarity in the context of the Orphan Drug Legislation. PhD thesis, University of Sheffield.

[img] Text
Final clean - PhD Thesis - September 2015.xps
Available under License Creative Commons Attribution-Noncommercial-No Derivative Works 2.0 UK: England & Wales.

Download (15Mb)


An orphan drug is a medicinal product intended for the treatment of a rare disease that affects only a small number of patients, e.g., five in ten thousand. According to the current European orphan drug legislation, the European Union shall not, for a period of ten years, accept another orphan medicinal product for the same therapeutic indication, in respect of a similar medicinal product. Thus far, the European Medicines Agency has used human judgments of similarity when assessing new medicines for rare diseases. The project reported here seeks to develop quantitative methods for this purpose. The project began with an analysis of the correlation between human and computed judgments of similarity for 100 pairs of molecules chosen from the Drug Bank 3.0 database. The human similarity assessments for these pairs of molecules were obtained from a total of 143 experts from Europe, Asia and the US, with the experts being asked to state whether each pair was, or was not, similar. The percentage of the experts judging a pair to be similar was then compared to the Tanimoto coefficient computed using a range of different types of descriptors (1D, 2D and 3D), with the aim of identifying those descriptors that correlated most closely with the human judgments. The following types of fingerprint were studied: ECFP4, ECFC4, Daylight, Unity, BCI, MDL as implemented in the Pipeline Pilot system; and CDK Extended, CDK Standard, Estate, PubChem, MACCS, Morgan, Feat Morgan, Atom Pair, Torsion, RDKit, Avalon, Layers, FP:TGD and FP:TGT as implemented in the KNIME system. The 3D fingerprints studied were the following: FP:TAD and FP:TAT as implemented in the KNIME protocol. 1D molecular property descriptors were also studied but these proved to be of only limited effectiveness for this application. Logistic regression models were developed for each type of descriptor, relating the Tanimoto similarity for a pair of molecules computed with the probability of the human experts regarding that pair as being similar. The resulting regression models were then validated using a separate test-set containing 100 pairs of molecules that had previously been evaluated by the European Medicines Agency in the context of the authorisation of medicines for rare diseases. The best models were able to reproduce over 95% of the human judgments. This success rate was increased to 98-99% using a simple data fusion approach in which a pair of molecules is classified as similar (or non-similar) when three or more of the individual fingerprints are in agreement. The results obtained here suggest that computed Tanimoto values based on 2D descriptors could provide a useful source of information when assessing new active substances that are being proposed for the treatment of rare diseases.

Item Type: Thesis (PhD)
Academic Units: The University of Sheffield > Faculty of Social Sciences (Sheffield) > Information School (Sheffield)
Identification Number/EthosID: uk.bl.ethos.666646
Depositing User: Mr Joao Pedro Pereira Franco
Date Deposited: 29 Sep 2015 14:45
Last Modified: 03 Oct 2016 12:19
URI: http://etheses.whiterose.ac.uk/id/eprint/10148

You do not need to contact us to get a copy of this thesis. Please use the 'Download' link(s) above to get a copy.
You can contact us about this thesis. If you need to make a general enquiry, please see the Contact us page.

Actions (repository staff only: login required)