Repository logo
 

Comparison of Two Newly Developed Multiple Imputation Methods for MNAR Cross-Sectional Data

Date

2020-03-09

Journal Title

Journal ISSN

Volume Title

Publisher

ORCID

0000-0002-1744-536X

Type

Thesis

Degree Level

Doctoral

Abstract

The problem of missing not at random (MNAR) data is a highly complex problem to the difficulty of joint modeling the outcome values and missing pattern while taking the variability of the missing data into consideration. In recent years, two methods by Galimard et. al (2016) and Ogundimu & Collins (2017) each developed their own multiple imputation (MI) methods for handling MNAR data. However, they have yet to be tested for their effectiveness in research sufficiently. This dissertation investigates the effectiveness of Galimard et. al and Ogundimu & Collins’ MIs alongside complete case (CC) analysis and Rubin’s MI when applied to two real-life datasets of different size (n1 = 4451, n2 = 1607) with induced missing data of MCAR, MAR, and MNAR mechanisms of 15%, 30%, and 50% missing data percentage. In addition, the methods will also be applied to simulated datasets with imputation and response models more complicated than in Galimard et. al and Ogundimu & Collins’ studies to see how widely they can be applied in datasets with different missing mechanisms and data percentage. It was found in the application results that Galimard et. al’s MI delivered the same results as CC in all missing mechanism and percentage combinations. For both datasets, Ogundimu & Collins’ MI performed better than the other 3 methods for 50% MNAR, though overall, both Galimard et. al and Ogundimu & Collins’ MIs performed better on MCAR and MAR data than MNAR. In simulation, Galimard et. al’s MI also delivered results consistently identical to CC for all missing percentage and mechanism combinations. Ogundimu & Collins’ MI consistently delivered superior results than the other 3 methods for 15% and 30% MNAR. However, Ogundimu & Collins’ MI should be used with caution because it did not converge for 50% missing and only converged for approximately 100 – 400 datasets out of 1000 for 15% and 30%. It will be interesting if future studies can apply Galimard et. al and Ogundimu & Collins’ MI methods other real-life datasets and easily-converge simulated datasets to see how well they can work when applied broadly in research and industry.

Description

Keywords

Missing Not At Random, Multiple Imputation

Citation

Degree

Doctor of Philosophy (Ph.D.)

Department

School of Public Health

Program

Biostatistics

Part Of

item.page.relation.ispartofseries

DOI

item.page.identifier.pmid

item.page.identifier.pmcid