Nur, Ula Ali Mohamed (2004) Handling missing data in analyses of the UK women's cohort study. PhD thesis, University of Leeds.
Missing values are a problem in large-scale surveys with extensive questionnaires. The analysis of the complete records may yield inferences substantially different from those that would be obtained had no data been missing. The aim of this dissertation is to critically examine ways of handling missing data in the UK Women Cohort Study (UKWCS). This is a large dataset with continuous, categorical and binary variables with missing values in almost every variable. A number of simple imputation techniques, as well as multiple imputation developed by Rubin (1987), and multiple imputation by chained equations using the Gibbs sampling (Van Buuren, 1999), were explored in a number of illustrative analyses associated with the UKWCS. Three approaches of handling missing dietary information on alcohol consumption were compared. The comparison shows that ignoring missingness by analysing only complete cases produces bias (lower means). Imputing an extreme value zero as is customary at present, underestimates the actual alcohol consumption, it also incorrectly increases the apparent precision of estimation (i. e. inappropriately small standard errors). A published study, Pollard et al, (2001) which based its conclusion on one third of the records was replicated after handing missing data by multiple imputation. Multiple imputation by chained equations, an iterative technique, which deals with missing values when every variable is incomplete, was applied. This method greatly improved the results by utilizing most of the information in the incomplete records. The method has the advantage that the algorithm intended for analysing the complete data is applied several times, without any alterations. The implications of missing data were also studied in a survival analysis, investigating the link between incidence of breast cancer and a number of prognostic factors. The thesis recommends multiple imputation for handling missing data, by which most of the information in the dataset is exploited, and helps in efficient inferences to be made from subsequent analyses.
|Item Type:||Thesis (PhD)|
|Academic Units:||The University of Leeds > Faculty of Medicine and Health (Leeds) > Institute of Health Sciences (Leeds)|
|Depositing User:||Ethos Import|
|Date Deposited:||29 Jan 2010 09:41|
|Last Modified:||08 Aug 2013 08:43|