Record Details

Replication Data for: The MIDAS Touch: Accurate and Scalable Missing-Data Imputation with Deep Learning

Harvard Dataverse (Africa Rice Center, Bioversity International, CCAFS, CIAT, IFPRI, IRRI and WorldFish)

View Archive Info
 
 
Field Value
 
Title Replication Data for: The MIDAS Touch: Accurate and Scalable Missing-Data Imputation with Deep Learning
 
Identifier https://doi.org/10.7910/DVN/UPL4TT
 
Creator Lall, Ranjit
Robinson, Thomas
 
Publisher Harvard Dataverse
 
Description Replication and simulation reproduction materials for the article "The MIDAS Touch: Accurate and Scalable Missing-Data Imputation with Deep Learning."

Please see the README file for a summary of the contents and the Replication Guide for a more detailed description.

Article abstract:
Principled methods for analyzing missing values, based chiefly on multiple imputation, have become increasingly popular yet can struggle to handle the kinds of large and complex data that are also becoming common. We propose an accurate, fast, and scalable approach to multiple imputation, which we call MIDAS (Multiple Imputation with Denoising Autoencoders). MIDAS employs a class of unsupervised neural networks known as denoising autoencoders, which are designed to reduce dimensionality by corrupting and attempting to reconstruct a subset of data. We repurpose denoising autoencoders for multiple imputation by treating missing values as an additional portion of corrupted data and drawing imputations from a model trained to minimize the reconstruction error on the originally observed portion. Systematic tests on simulated as well as real social science data, together with an applied example involving a large-scale electoral survey, illustrate MIDAS's accuracy and efficiency across a range of settings. We provide open-source software for implementing MIDAS.
 
Subject Social Sciences
missing data; multiple imputation; machine learning; imputation methods
 
Contributor Lall, Ranjit