Working with Deep Generative Models and Tabular Data Imputation

[en] Datasets with missing values are very common in industry applications. Missing data typically have a negative impact on machine learning models. With the rise of generative models in deep learning, recent studies proposed solutions to the problem of imputing missing values based various deep generative models. Previous experiments with Generative Adversarial Networks (GANs) and Variational Autoencoders (VAEs) showed promising results in this domain. Initially, these results focused on imputation in image data, e.g. filling missing patches in images. Recent proposals addressed missing values in tabular data. For these data, the case for deep generative models seems to be less clear. In the process of providing a fair comparison of proposed methods, we uncover several issues when assessing the status quo: the use of under-specified and ambiguous dataset names, the large range of parameters and hyper-parameters to tune for each method, and the use of different metrics and evaluation methods.

Disciplines :

Computer science

Author, co-author :

Camino, Ramiro Daniel ; University of Luxembourg > Interdisciplinary Centre for Security, Reliability and Trust (SNT)

Hammerschmidt, Christian ; Delft University of Technology

State, Radu ; University of Luxembourg > Interdisciplinary Centre for Security, Reliability and Trust (SNT)

External co-authors :

yes

Language :

English

Title :

Working with Deep Generative Models and Tabular Data Imputation

Publication date :

17 July 2020

Event name :

First Workshop on the Art of Learning with Missing Values (Artemiss)

Event organizer :

Hosted by the 37th International Conference on Machine Learning (ICML)

Event place :

Vienna, Austria

Event date :

from 12-07-2020 to 18-07-2020

Audience :

International

Focus Area :

Computational Sciences

Available on ORBilu :

since 20 August 2020

Statistics

Number of views

186 (9 by Unilu)

Number of downloads

597 (5 by Unilu)

More statistics