White Rose University Consortium logo
University of Leeds logo University of Sheffield logo York University logo

Text Analytics to Predict Time and Cause of Death from Verbal Autopsies

Danso, Samuel Odei (2015) Text Analytics to Predict Time and Cause of Death from Verbal Autopsies. PhD thesis, University of Leeds.

danso_SO_Computing_PhD_2015.pdf - Final eThesis - complete (pdf)
Available under License Creative Commons Attribution-Noncommercial-Share Alike 2.0 UK: England & Wales.

Download (15Mb) | Preview


This thesis describes the first Text Analytics approach to predicting Causes of Death (CoD) from Verbal Autopsies (VA). VA is an alternative technique recommended by the World Health Organisation for ascertaining CoD in low and middle-income countries (LMIC). CoD information is vitally important in the provision of healthcare. CoD information from VA can be obtained via two main approaches: manual, also referred to as the physician-review and automatic. The automatic-based approach is an active research area due to its efficiency and cost effectiveness over the manual approach. VA contains both closed responses and open narrative text. However, the open narrative text has been ignored by the state-of-art automatic approaches and this remains a challenge and an important research issue. We hypothesise that it is feasible to predict CoD from the narratives of VA. We further contend that an automatic approach that could utilise the information contained in both narrative and closed response text of VA could lead to an improved prediction accuracy of CoD. This research has been formulated as a Text Classification problem, which employs Corpus and Computational Linguistics, Natural Language Processing and Machine Learning techniques to automatically classify VA documents according to CoD. Firstly, the research uses a VA corpus built from a sample collection of over 11,400 VA documents collected during a 10 year period in Ghana, West Africa. About 80 per cent of these documents have been annotated with CoD by medical experts. Secondly, we design experiments to identify Machine Learning techniques (algorithm, feature representation scheme, and feature reduction strategy) suitable for classifying VA open narratives (VAModel1). Thirdly, we propose novel methods of extracting features to build a model that predicts CoD from VA narratives using the annotated VA corpus as training and testing set. Furthermore, we develop two additional models: only closed responses based (VAModel2); and a hybrid of closed and open narrative based model (VAModel3). Our VAModel1 performs reasonably better than our baseline model, suggesting the feasibility of predicting the CoD from the VA open narratives. Overall, VAModel3 performance was observed to achieve better performance than VAModel1 but not significantly better than VAModel2. Also, in terms of reliability, VAModel1 obtained a moderate agreement (kappa score = 0.4) when compared with the gold standard– medical experts (average annotation agreement between medical experts, kappa score= 0.64). Furthermore, an acceptable agreement was obtained for VAModel2 (kappa score =0.71) and VAModel3 (kappa score =0.75), suggesting the reliability of these two models is better than medical experts. Also, a detailed analysis suggested that combining information from narratives and closed responses leads to an increase in performance for some CoD categories whereas information obtained from the closed responses part is enough for other CoD categories. Our research provides an alternative automatic approach to predicting CoD from VA, which is essential for LMIC. Therefore, further research into various aspects of the modelling process could improve the current performance of automatically predicting CoD from VAs.

Item Type: Thesis (PhD)
Keywords: Text Analytics, Machine Learning, Verbal Autopsy, Corpus Linguistics, Computational Linguistics
Academic Units: The University of Leeds > Faculty of Engineering (Leeds) > School of Computing (Leeds)
Identification Number/EthosID: uk.bl.ethos.684501
Depositing User: Samuel Odei Danso
Date Deposited: 04 May 2016 08:46
Last Modified: 25 Jul 2018 09:52
URI: http://etheses.whiterose.ac.uk/id/eprint/12400

You do not need to contact us to get a copy of this thesis. Please use the 'Download' link(s) above to get a copy.
You can contact us about this thesis. If you need to make a general enquiry, please see the Contact us page.

Actions (repository staff only: login required)