Eprint already available on another site (E-prints, Working papers and Research blog)
On the Composition and Limitations of Publicly Available COVID-19 X-Ray Imaging Datasets
GARCIA SANTA CRUZ, Beatriz; Sölter, Jan; BOSSA, Matias Nicolas et al.
2020
 

Files


Full Text
On_the_composition_and_limitations_of_publicly_available_covid19_x_ray_imaging_datasets.pdf
Publisher postprint (757.67 kB)
Download

All documents in ORBilu are protected by a user license.

Send to



Details



Keywords :
COVID-19; Machine Learning; Computer Vision
Abstract :
[en]  Machine learning based methods for diagnosis and progression prediction of COVID-19 from imaging data have gained significant attention in the last months, in particular by the use of deep learning models. In this context hundreds of models where proposed with the majority of them trained on public datasets. Data scarcity, mismatch between training and target population, group imbalance, and lack of documentation are important sources of bias, hindering the applicability of these models to real-world clinical practice. Considering that datasets are an essential part of model building and evaluation, a deeper understanding of the current landscape is needed. This paper presents an overview of the currently public available COVID-19 chest X-ray datasets. Each dataset is briefly described and potential strength, limitations and interactions between datasets are identified. In particular, some key properties of current datasets that could be potential sources of bias, impairing models trained on them are pointed out. These descriptions are useful for model building on those datasets, to choose the best dataset according the model goal, to take into account the specific limitations to avoid reporting overconfident benchmark results, and to discuss their impact on the generalisation capabilities in a specific clinical setting.
Disciplines :
Engineering, computing & technology: Multidisciplinary, general & others
Author, co-author :
GARCIA SANTA CRUZ, Beatriz ;  University of Luxembourg > Luxembourg Centre for Systems Biomedicine (LCSB)
Sölter, Jan ;  University of Luxembourg
BOSSA, Matias Nicolas ;  University of Luxembourg
HUSCH, Andreas  ;  University of Luxembourg > Luxembourg Centre for Systems Biomedicine (LCSB)
Language :
English
Title :
On the Composition and Limitations of Publicly Available COVID-19 X-Ray Imaging Datasets
Publication date :
26 August 2020
Version :
1
Number of pages :
12
Focus Area :
Systems Biomedicine
Computational Sciences
Funders :
COVID-19/2020-1/14702831/AICovIX/Husch
Available on ORBilu :
since 28 August 2020

Statistics


Number of views
436 (16 by Unilu)
Number of downloads
198 (4 by Unilu)

Bibliography


Similar publications



Contact ORBilu