Roberts, Angus (2012) Clinical Information Extraction: Lowering the Barrier. PhD thesis, University of Sheffield.
Abstract
Electronic Patient Records have opened up the possibility of re-using the data collected for clinical practice, to support both clinical practice itself, and clinical research. In order to achieve this re-use, we have to address the issue that most Electronic Patient Records make heavy use of narrative text. This thesis reports an approach to automatically extract clinically significant information from the textual component of the medical record, in order to support re-use of that record. The cost of developing such information extraction systems is currently seen to be a barrier to their deployment. We explore ways of lowering this barrier, through the separation of the linguistic, medical and engineering knowledge and skills required for development.
We describe a rigorous methodology for the construction of a corpus of clinical texts semantically annotated by medical experts, and its use to automatically train a supervised machine learning-based information extraction system. We explore the re-use of existing medical knowledge in the form of terminologies, and present a way in which these terminologies can be coupled with supervised machine learning for information extraction. Finally, we consider the extent to which pre-existing software components can be used to construct a clinical IE system, and build a system that is capable of extracting clinical concepts, their properties, and the relationships between them.
The resulting system shows that it is possible to achieve separation of linguistic, medical and engineering knowledge in clinical information extraction. We find that existing software frameworks are capable of some aspects of information extraction with little additional engineering work, but that they are not mature enough for the construction of a full system by the non-expert. We also find that a new cost is introduced in separating domain and linguistic knowledge, that of manual annotation by domain experts.
Metadata
Supervisors: | Robert, Gaizauskas |
---|---|
Publicly visible additional information: | Appendix B is also provided as a CD with the print version. |
Keywords: | Information Extraction, Natural Language Processing, Medical Text, Clinical Text, Medical Informatics, Clinical Informatics, Medical Records, Electronic Patient Record, Machine Learning, Software Engineering |
Awarding institution: | University of Sheffield |
Academic Units: | The University of Sheffield > Faculty of Engineering (Sheffield) > Computer Science (Sheffield) The University of Sheffield > Faculty of Science (Sheffield) > Computer Science (Sheffield) |
Identification Number/EthosID: | uk.bl.ethos.566310 |
Depositing User: | Mr Angus Roberts |
Date Deposited: | 15 Feb 2013 14:20 |
Last Modified: | 27 Apr 2016 14:11 |
Open Archives Initiative ID (OAI ID): | oai:etheses.whiterose.ac.uk:3254 |
Downloads
Thesis
Filename: thesis.pdf
Description: Thesis
Licence:
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivs 2.5 License
Zipped HTML, Appendix B
Filename: AnnotationGuidelines.zip
Description: Zipped HTML, Appendix B
Licence:
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivs 2.5 License
Export
Statistics
You do not need to contact us to get a copy of this thesis. Please use the 'Download' link(s) above to get a copy.
You can contact us about this thesis. If you need to make a general enquiry, please see the Contact us page.