White Rose University Consortium logo
University of Leeds logo University of Sheffield logo York University logo

Clinical Information Extraction: Lowering the Barrier

Roberts, Angus (2012) Clinical Information Extraction: Lowering the Barrier. PhD thesis, University of Sheffield.

[img]
Preview
Text (Thesis)
thesis.pdf
Available under License Creative Commons Attribution-Noncommercial-No Derivative Works 2.0 UK: England & Wales.

Download (819Kb)
[img] Archive (Zipped HTML, Appendix B)
AnnotationGuidelines.zip
Available under License Creative Commons Attribution-Noncommercial-No Derivative Works 2.0 UK: England & Wales.

Download (115Kb)

Abstract

Electronic Patient Records have opened up the possibility of re-using the data collected for clinical practice, to support both clinical practice itself, and clinical research. In order to achieve this re-use, we have to address the issue that most Electronic Patient Records make heavy use of narrative text. This thesis reports an approach to automatically extract clinically significant information from the textual component of the medical record, in order to support re-use of that record. The cost of developing such information extraction systems is currently seen to be a barrier to their deployment. We explore ways of lowering this barrier, through the separation of the linguistic, medical and engineering knowledge and skills required for development. We describe a rigorous methodology for the construction of a corpus of clinical texts semantically annotated by medical experts, and its use to automatically train a supervised machine learning-based information extraction system. We explore the re-use of existing medical knowledge in the form of terminologies, and present a way in which these terminologies can be coupled with supervised machine learning for information extraction. Finally, we consider the extent to which pre-existing software components can be used to construct a clinical IE system, and build a system that is capable of extracting clinical concepts, their properties, and the relationships between them. The resulting system shows that it is possible to achieve separation of linguistic, medical and engineering knowledge in clinical information extraction. We find that existing software frameworks are capable of some aspects of information extraction with little additional engineering work, but that they are not mature enough for the construction of a full system by the non-expert. We also find that a new cost is introduced in separating domain and linguistic knowledge, that of manual annotation by domain experts.

Item Type: Thesis (PhD)
Additional Information: Appendix B is also provided as a CD with the print version.
Keywords: Information Extraction, Natural Language Processing, Medical Text, Clinical Text, Medical Informatics, Clinical Informatics, Medical Records, Electronic Patient Record, Machine Learning, Software Engineering
Academic Units: The University of Sheffield > Faculty of Engineering (Sheffield) > Computer Science (Sheffield)
The University of Sheffield > Faculty of Science (Sheffield) > Computer Science (Sheffield)
Identification Number/EthosID: uk.bl.ethos.566310
Depositing User: Mr Angus Roberts
Date Deposited: 15 Feb 2013 14:20
Last Modified: 27 Apr 2016 14:11
URI: http://etheses.whiterose.ac.uk/id/eprint/3254

Actions (repository staff only: login required)