Nur, Ula Ali Mohamed (2004) Handling missing data in analyses of the UK women's cohort study. PhD thesis, University of Leeds.
Abstract
Missing values are a problem in large-scale surveys with extensive questionnaires. The analysis of the complete records may yield inferences substantially different
from those that would be obtained had no data been missing.
The aim of this dissertation is to critically examine ways of handling missing data in the UK Women Cohort Study (UKWCS). This is a large dataset with continuous, categorical and binary variables with missing values in almost every variable.
A number of simple imputation techniques, as well as multiple imputation developed by Rubin (1987), and multiple imputation by chained equations using the Gibbs sampling (Van Buuren, 1999), were explored in a number of illustrative analyses associated with the UKWCS.
Three approaches of handling missing dietary information on alcohol consumption were compared. The comparison shows that ignoring missingness by analysing only complete cases produces bias (lower means). Imputing an extreme value zero as is customary at present, underestimates the actual alcohol consumption, it also incorrectly increases the apparent precision of estimation (i. e. inappropriately small standard errors).
A published study, Pollard et al, (2001) which based its conclusion on one third of the records was replicated after handing missing data by multiple imputation. Multiple imputation by chained equations, an iterative technique, which deals with missing values when every variable is incomplete, was applied. This method greatly improved the results by utilizing most of the information in the incomplete records. The method has the advantage that the algorithm intended for analysing the complete data is applied several times, without any alterations. The implications of missing data were also studied in a survival analysis, investigating the link between incidence of breast cancer and a number of prognostic factors. The thesis recommends multiple imputation for handling
missing data, by which most of the information in the dataset is exploited, and helps in efficient inferences to be made from subsequent analyses.
Metadata
Supervisors: | Greenwood, D. and Longford, N. |
---|---|
Awarding institution: | University of Leeds |
Academic Units: | The University of Leeds > Faculty of Medicine and Health (Leeds) > School of Medicine (Leeds) > Leeds Institute of Health Sciences |
Identification Number/EthosID: | uk.bl.ethos.513958 |
Depositing User: | Ethos Import |
Date Deposited: | 29 Jan 2010 09:41 |
Last Modified: | 07 Mar 2014 10:27 |
Open Archives Initiative ID (OAI ID): | oai:etheses.whiterose.ac.uk:317 |
You do not need to contact us to get a copy of this thesis. Please use the 'Download' link(s) above to get a copy.
You can contact us about this thesis. If you need to make a general enquiry, please see the Contact us page.