McClure, Maryhilda Heidi (2018) Automatically Explaining Literature Based Discoveries. PhD thesis, University of Sheffield.
Abstract
Literature based discovery (LBD) identifies potentially related pairs of concepts that are not mentioned together in the same documents. The concept pairs may be identified via linking concepts that are mentioned in both sets of documents or via other statistical relatedness measures like latent semantic indexing. Unfortunately, the nature of the relationships are not identified so the importance and relevancy of the LBD pairs are not known.
The primary objectives of this thesis are to identify candidate LBD related concepts and to determine if the natures of the relationship may be automatically explained using supervised machine learning classification. For example, in the benchmark LBD example of Raynaud’s phenomenon (A) being related to fish oil (C), candidate linking concepts are blood viscosity, platelet function and vascular reactivity. The linking concepts are referred to as Bs and, thus, create A-B-C LBD triples. The objectives of this work are to identify a training set of data that includes linking B terms, to identify the relationships between the A and B and the B and C pairs, and to apply supervised machine learning classification techniques to suggest relationship between the A to C concepts. In the Raynaud’s example, the suggestion would be that fish oil may treat Raynaud’s phenomenon.
This work explores data representations suitable for applying classification techniques to explain the relationships. This work applies traditional classification evaluation methods on both classifier outcomes and data designs. Classifiers applied to the training data ultimately accurately predicted the A to C relationships over 70% of the time, while the chosen baselines only achieved approximately 30% accurately predicted relationships. The classifiers were then used on real LBD candidate pairs from an older set of MEDLINE abstracts found using statistical LBD. The predicted LBD explanations were validated against more recent literature which is a time-slice validation approach.
To the best of my knowledge and research, relationship prediction techniques have not been applied to statistically related LBD candidate pairs to provide an explanation of how the A and C pairs are related. Additionally, applying time-slicing for validation of explained LBD candidates is also novel.
Metadata
Supervisors: | Stevenson, Mark and Gaizauskas, Rob |
---|---|
Keywords: | Literature Based Discovery, LBD, Machine Learning, Relationship Prediction, Text Classification, Time-slice validation, MEDLINE, SemMedDB, Semantic Medline Database |
Awarding institution: | University of Sheffield |
Academic Units: | The University of Sheffield > Faculty of Engineering (Sheffield) > Computer Science (Sheffield) The University of Sheffield > Faculty of Science (Sheffield) > Computer Science (Sheffield) |
Identification Number/EthosID: | uk.bl.ethos.749513 |
Depositing User: | M. Heidi McClure |
Date Deposited: | 31 Jul 2018 09:36 |
Last Modified: | 12 Oct 2018 09:55 |
Open Archives Initiative ID (OAI ID): | oai:etheses.whiterose.ac.uk:21038 |
Download
mcclure 20180705
Filename: mcclure 20180705.pdf
Licence:
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivs 2.5 License
Export
Statistics
You do not need to contact us to get a copy of this thesis. Please use the 'Download' link(s) above to get a copy.
You can contact us about this thesis. If you need to make a general enquiry, please see the Contact us page.