White Rose University Consortium logo
University of Leeds logo University of Sheffield logo York University logo

Applications and Variations of the Maximum Common Subgraph for the Determination of Chemical Similarity

Duesbury, Edmund (2015) Applications and Variations of the Maximum Common Subgraph for the Determination of Chemical Similarity. PhD thesis, University of Sheffield.

[img]
Preview
Text (Thesis PDF post-corrections)
thesis_mk2_final.pdf
Available under License Creative Commons Attribution-Noncommercial-No Derivative Works 2.0 UK: England & Wales.

Download (11Mb) | Preview

Abstract

The Maximum Common Substructure (MCS), along with numerous graph theory techniques, has been used widely in chemoinformatics. A topic which has been studied at Sheffield is the hyperstructure concept - a chemical definition of a superstructure, which represents the graph theoretic union of several molecules. This technique however, has been poorly studied in the context of similarity-based virtual screening. Most hyperstructure literature to date has focused on either construction methodology, or property prediction on small datasets of compounds. The work in this thesis is divided into two parts. The first part describes a method for constructing hyperstructures, and then describes the application of a hyperstructure in similarity searching in large compound datasets, comparing it with extended connectivity fingerprint and MCS similarity. Since hyperstructures performed significantly worse than fingerprints, additional work is described concerning various weighting schemes of hyperstructures. Due to the poor performance of hyperstructure and MCS screening compared to fingerprints, it was questioned whether the type of maximum common substructure algorithm and type had an influence. A series of MCS algorithms and types were compared for both speed, MCS size, and virtual screening ability. A topologically-constrained variant of the MCS was found to be competitive with fingerprints, and fusion of the two techniques overall improved active compound recall.

Item Type: Thesis (PhD)
Academic Units: The University of Sheffield > Faculty of Social Sciences (Sheffield) > Information School (Sheffield)
Identification Number/EthosID: uk.bl.ethos.684997
Depositing User: Mr Edmund Duesbury
Date Deposited: 12 May 2016 11:21
Last Modified: 03 Oct 2016 13:12
URI: http://etheses.whiterose.ac.uk/id/eprint/13063

Actions (repository staff only: login required)