White Rose University Consortium logo
University of Leeds logo University of Sheffield logo York University logo

Semantic tagging of medical narratives using SNOMED CT

Hina, Saman (2013) Semantic tagging of medical narratives using SNOMED CT. PhD thesis, University of Leeds.

HINA_S_COMPUTING_PhD_2013.pdf - Final eThesis - complete (pdf)
Available under License Creative Commons Attribution-Noncommercial-Share Alike 2.0 UK: England & Wales.

Download (2796Kb) | Preview


In the medical domain, semantic analysis is critical for several research questions which are not only limited to healthcare researchers but are of interest to NLP researchers. Yet, most of the data exists in the form of medical narratives. Semantic analysis of medical narratives is required to be carried out for the identification of semantic information and its classification with semantic categories. This semantic analysis is useful for domain users as well as non-domain users for further investigations. The main objective of this research is to develop a generic semantic tagger for medical narratives using a tag set derived from SNOMED CT® which is an international healthcare terminology. Towards this objective, the key hypothesis is that it is possible to identify semantic information (paraphrases of concepts, abbreviations of concepts and complex multiword concepts) in medical narratives and classify with globally known semantic categories by analysis of an authentic corpus of medical narratives and the language of SNOMED CT®. This research began with an investigation of using SNOMED CT® for identification of concepts in medical narratives which resulted in the derivation of a tag set. Later in this research, this tag set was used to develop three gold standard datasets. One of these datasets required anonymization because it contained four protected health information (PHI) categories. Therefore, a separate module was developed for the anonymization of these PHI categories. After the anonymization, a generic annotation scheme was developed and evaluated for the annotation of three gold standard datasets. One of the gold standard datasets was used to develop generic rule-patterns for the semantic tagger while the other two datasets were used for the evaluation of semantic tagger. Besides evaluation using the gold standard datasets, the semantic tagger was compared with three systems based on different methods, and shown to outperform them.

Item Type: Thesis (PhD)
ISBN: 978-0-85731-522-9
Academic Units: The University of Leeds > Faculty of Engineering (Leeds) > School of Computing (Leeds)
Identification Number/EthosID: uk.bl.ethos.595145
Depositing User: Repository Administrator
Date Deposited: 28 Feb 2014 09:49
Last Modified: 08 Feb 2019 11:17
URI: http://etheses.whiterose.ac.uk/id/eprint/5289

You do not need to contact us to get a copy of this thesis. Please use the 'Download' link(s) above to get a copy.
You can contact us about this thesis. If you need to make a general enquiry, please see the Contact us page.

Actions (repository staff only: login required)