Clinical Information Extraction: Lowering the Barrier

Abstract

Electronic Patient Records have opened up the possibility of re-using the data collected for clinical practice, to support both clinical practice itself, and clinical research. In order to achieve this re-use, we have to address the issue that most Electronic Patient Records make heavy use of narrative text. This thesis reports an approach to automatically extract clinically significant information from the textual component of the medical record, in order to support re-use of that record. The cost of developing such information extraction systems is currently seen to be a barrier to their deployment. We explore ways of lowering this barrier, through the separation of the linguistic, medical and engineering knowledge and skills required for development.

We describe a rigorous methodology for the construction of a corpus of clinical texts semantically annotated by medical experts, and its use to automatically train a supervised machine learning-based information extraction system. We explore the re-use of existing medical knowledge in the form of terminologies, and present a way in which these terminologies can be coupled with supervised machine learning for information extraction. Finally, we consider the extent to which pre-existing software components can be used to construct a clinical IE system, and build a system that is capable of extracting clinical concepts, their properties, and the relationships between them.

The resulting system shows that it is possible to achieve separation of linguistic, medical and engineering knowledge in clinical information extraction. We find that existing software frameworks are capable of some aspects of information extraction with little additional engineering work, but that they are not mature enough for the construction of a full system by the non-expert. We also find that a new cost is introduced in separating domain and linguistic knowledge, that of manual annotation by domain experts.

Metadata

Supervisors:	Robert, Gaizauskas
Publicly visible additional information:	Appendix B is also provided as a CD with the print version.
Keywords:	Information Extraction, Natural Language Processing, Medical Text, Clinical Text, Medical Informatics, Clinical Informatics, Medical Records, Electronic Patient Record, Machine Learning, Software Engineering
Awarding institution:	University of Sheffield
Academic Units:	The University of Sheffield > Faculty of Engineering (Sheffield) > Computer Science (Sheffield) The University of Sheffield > Faculty of Science (Sheffield) > Computer Science (Sheffield)
Identification Number/EthosID:	uk.bl.ethos.566310
Depositing User:	Mr Angus Roberts
Date Deposited:	15 Feb 2013 14:20
Last Modified:	27 Apr 2016 14:11
Open Archives Initiative ID (OAI ID):	oai:etheses.whiterose.ac.uk:3254

Downloads

Thesis

Filename: thesis.pdf

Description: Thesis

Licence:
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivs 2.5 License

CLICK TO DOWNLOAD

Zipped HTML, Appendix B

Filename: AnnotationGuidelines.zip

Description: Zipped HTML, Appendix B