A comparison of various software tools for dealing with missing data via imputation

CORTINAS ABRAHANTES, Jose; SOTTO, Cristina; MOLENBERGHS, Geert; Vromman, Geert; Bierinckx, Bart

Please use this identifier to cite or link to this item: http://hdl.handle.net/1942/13597

Title:	A comparison of various software tools for dealing with missing data via imputation
Authors:	CORTINAS ABRAHANTES, Jose SOTTO, Cristina MOLENBERGHS, Geert Vromman, Geert Bierinckx, Bart
Issue Date:	2011
Publisher:	TAYLOR & FRANCIS LTD
Source:	JOURNAL OF STATISTICAL COMPUTATION AND SIMULATION, 81 (11), p. 1653-1675
Abstract:	In real-life situations, we often encounter data sets containing missing observations. Statistical methods that address missingness have been extensively studied in recent years. One of the more popular approaches involves imputation of the missing values prior to the analysis, thereby rendering the data complete. Imputation broadly encompasses an entire scope of techniques that have been developed to make inferences about incomplete data, ranging from very simple strategies (e. g. mean imputation) to more advanced approaches that require estimation, for instance, of posterior distributions using Markov chain Monte Carlo methods. Additional complexity arises when the number of missingness patterns increases and/or when both categorical and continuous random variables are involved. Implementation of routines, procedures, or packages capable of generating imputations for incomplete data are now widely available. We review some of these in the context of a motivating example, as well as in a simulation study, under two missingness mechanisms (missing at random and missing not at random). Thus far, evaluation of existing implementations have frequently centred on the resulting parameter estimates of the prescribed model of interest after imputing the missing data. In some situations, however, interest may very well be on the quality of the imputed values at the level of the individual - an issue that has received relatively little attention. In this paper, we focus on the latter to provide further insight about the performance of the different routines, procedures, and packages in this respect.
Notes:	Abrahantes, JC (reprint author), European Food Safety Author EFSA, Assessment Methodol Unit, Largo Palli Natale 5-A, I-43121 Parma, Italy. [Abrahantes, Jose Cortinas; Sotto, Cristina; Molenberghs, Geert] Univ Hasselt, Interuniv Inst Biostat & Stat Bioinformat, B-3590 Diepenbeek, Belgium. [Sotto, Cristina] Univ Philippines, Sch Stat, Quezon City, Philippines. [Molenberghs, Geert] Katholieke Univ Leuven, Interuniv Inst Biostat & Stat Bioinformat, B-3000 Louvain, Belgium. [Vromman, Geert; Bierinckx, Bart] IM Associates BVBA, Sales & Mkt Effectiveness, B-3000 Louvain, Belgium. jose.cortinasabrahantes@efsa.europa.eu
Keywords:	Computer Science; Interdisciplinary Applications; Statistics & Probability; multiple imputation; missing data; missing at random; missing not at random; random forest;multiple imputation; missing data; missing at random; missing not at random; random forest
Document URI:	http://hdl.handle.net/1942/13597
ISSN:	0094-9655
e-ISSN:	1563-5163
DOI:	10.1080/00949655.2010.498788
ISI #:	000299726700020
Rights:	© 2011 Taylor & Francis
Category:	A1
Type:	Journal Contribution
Validations:	ecoom 2013
Appears in Collections:	Research publications

Files in This Item:

File	Description	Size	Format
a.pdf Restricted Access	Published version	281.88 kB	Adobe PDF	View/Open Request a copy

Show full item record

SCOPUS^TM
Citations

7

checked on Oct 20, 2025

WEB OF SCIENCE^TM
Citations

7

checked on Oct 26, 2025

Google Scholar^TM

Check

Files in This Item:

SCOPUSTM Citations

WEB OF SCIENCETM Citations

Google ScholarTM

Altmetric

SCOPUS^TM
Citations

WEB OF SCIENCE^TM
Citations

Google Scholar^TM