White Rose University Consortium logo
University of Leeds logo University of Sheffield logo York University logo

The Application of Spectral Clustering in Drug Discovery

Gan, Sonny (2013) The Application of Spectral Clustering in Drug Discovery. PhD thesis, University of Sheffield.

[img]
Preview
Text (pdf)
The Application of Spectral Clustering in Drug Discovery.pdf
Available under License Creative Commons Attribution Noncommercial 2.0 UK: England & Wales.

Download (30Mb) | Preview

Abstract

The application of clustering algorithms to chemical datasets is well established and has been reviewed extensively. Recently, a number of ‘modern’ clustering algorithms have been reported in other fields. One example is spectral clustering, which has yielded promising results in areas such as protein library analysis. The term spectral clustering is used to describe any clustering algorithm that utilises the eigenpairs of a matrix as the basis for partitioning a dataset. This thesis describes the development and optimisation of a non-overlapping spectral clustering method that is based upon a study by Brewer. The initial version of the spectral clustering algorithm was closely related to Brewer’s method and used a full matrix diagonalisation procedure to identify the eigenpairs of an input matrix. This spectral clustering method was compared to the k-means and Ward’s algorithms, producing encouraging results, for example, when coupled with extended connectivity fingerprints, this method outperformed the other clustering algorithms according to the QCI measure. Although the spectral clustering algorithm showed promising results, its operational costs restricted its application to small datasets. Hence, the method was optimised in successive studies. Firstly, the effect of matrix sparsity on the spectral clustering was examined and showed that spectral clustering with sparse input matrices can lead to an improvement in the results. Despite this improvement, the costs of spectral clustering remained prohibitive, so the full matrix diagonalisation procedure was replaced with the Lanczos algorithm that has lower associated costs, as suggested by Brewer. This method led to a significant decrease in the computational costs when identifying a small number of clusters, however a number of issues remained; leading to the adoption of a SVD-based eigendecomposition method. The SVD-based algorithm was shown to be highly efficient, accurate and scalable through a number of studies.

Item Type: Thesis (PhD)
Academic Units: The University of Sheffield > Faculty of Social Sciences (Sheffield)
The University of Sheffield > Faculty of Social Sciences (Sheffield) > Information School (Sheffield)
Identification Number/EthosID: uk.bl.ethos.589230
Depositing User: Dr Sonny Gan
Date Deposited: 24 Dec 2013 11:39
Last Modified: 03 Oct 2016 11:03
URI: http://etheses.whiterose.ac.uk/id/eprint/4839

Actions (repository staff only: login required)