White Rose University Consortium logo
University of Leeds logo University of Sheffield logo York University logo

Substructural Analysis Using Evolutionary Computing Techniques

Sani, Nor Samsiah (2016) Substructural Analysis Using Evolutionary Computing Techniques. PhD thesis, University of Sheffield.

[img]
Preview
Text
NOR_PhD FINAL AFTER CORRECTIONS.pdf
Available under License Creative Commons Attribution-Noncommercial-No Derivative Works 2.0 UK: England & Wales.

Download (6Mb) | Preview

Abstract

Substructural analysis (SSA) was one of the very first machine learning techniques to be applied to chemoinformatics in the area of virtual screening. For this method, given a set of compounds typically defined by their fragment occurrence data (such as 2D fingerprints). The SSA computes weights for each of the fragments which outlines its contribution to the activity (or inactivity) of compounds containing that fragment. The overall probability of activity for a compound is then computed by summing up or combining the weights for the fragments present in the compound. A variety of weighting schemes based on specific relationship-bound equations are available for this purpose. This thesis identifies uplift to the effectiveness of SSA, using two evolutionary computation methods based on genetic traits, particularly the genetic algorithm (GA) and genetic programming (GP). Building on previous studies, it was possible to analyse and compare ten published SSA weighting schemes based on a simulated virtual screening experiment. The analysis showed the most effective weighting scheme to be the R4 equation which was a part of document-based weighting schemes. A second experiment was carried out to investigate the application of GA-based weighting scheme for the SSA in comparison to an experiment using the R4 weighting scheme. The GA algorithm is simple in concept focusing purely on suitable weight generation and effective in operation. The findings show that the GA-based SSA is superior to the R4-based SSA, both in terms of active compound retrieval rate and predictive performance. A third experiment investigated the genetic application via a GP-based SSA. Rigorous experiment results showed that the GP was found to be superior to the existing SSA weighting schemes. In general, however, the GP-based SSA was found to be less effective than the GA-based SSA. A final experimented is described in this thesis which sought to explore the feasibility of data fusion on both the GA and GP. It is a method producing a final ranking list from multiple sets of ranking lists, based on several fusion rules. The results indicate that data fusion is a good method to boost GA-and GP-based SSA searching. The RKP rule was considered the most effective fusion rule.

Item Type: Thesis (PhD)
Academic Units: The University of Sheffield > Faculty of Social Sciences (Sheffield)
The University of Sheffield > Faculty of Social Sciences (Sheffield) > Information School (Sheffield)
Identification Number/EthosID: uk.bl.ethos.694466
Depositing User: Mrs Nor Samsiah Sani
Date Deposited: 05 Oct 2016 11:26
Last Modified: 12 Oct 2018 09:27
URI: http://etheses.whiterose.ac.uk/id/eprint/13975

Actions (repository staff only: login required)