References of "Sölter, Jan 50040411"
     in
Bookmark and Share    
Full Text
Peer Reviewed
See detailPublic Covid-19 X-ray datasets and their impact on model bias - a systematic review of a significant problem
Garcia Santa Cruz, Beatriz UL; Bossa, Matias Nicolas UL; Sölter, Jan UL et al

in Medical Image Analysis (2021), 74

Computer-aided diagnosis and stratification of COVID-19 based on chest X-ray suffers from weak bias assessment and limited quality-control. Undetected bias induced by inappropriate use of datasets, and ... [more ▼]

Computer-aided diagnosis and stratification of COVID-19 based on chest X-ray suffers from weak bias assessment and limited quality-control. Undetected bias induced by inappropriate use of datasets, and improper consideration of confounders prevents the translation of prediction models into clinical practice. By adopting established tools for model evaluation to the task of evaluating datasets, this study provides a systematic appraisal of publicly available COVID-19 chest X-ray datasets, determining their potential use and evaluating potential sources of bias. Only 9 out of more than a hundred identified datasets met at least the criteria for proper assessment of the risk of bias and could be analysed in detail. Remarkably most of the datasets utilised in 201 papers published in peer-reviewed journals, are not among these 9 datasets, thus leading to models with a high risk of bias. This raises concerns about the suitability of such models for clinical use. This systematic review highlights the limited description of datasets employed for modelling and aids researchers to select the most suitable datasets for their task. [less ▲]

Detailed reference viewed: 200 (41 UL)
Peer Reviewed
See detailModel bias and its impact on computer-aided diagnosis: A data-centric approach
Garcia Santa Cruz, Beatriz UL; Bossa, Matias Nicolas UL; Sölter, Jan UL et al

Poster (2021, August)

Machine learning and data-driven solutions open exciting opportunities in many disciplines including healthcare. The recent transition to this technology into real clinical settings brings new challenges ... [more ▼]

Machine learning and data-driven solutions open exciting opportunities in many disciplines including healthcare. The recent transition to this technology into real clinical settings brings new challenges. Such problems derive from several factors, including their dataset origin, composition and description, hampering their fairness and secure application. Considering the potential impact of incorrect predictions in applied-ML healthcare research is urgent. Undetected bias induced by inappropriate use of datasets and improper consideration of confounders prevents the translation of prediction models into clinical practice. Therefore, in this work, the use of available systematic tools to assess the risk of bias in models is employed as the first step to explore robust solutions for better dataset choice, dataset merge and design of the training and validation step during the ML development pipeline. [less ▲]

Detailed reference viewed: 125 (17 UL)
Full Text
See detailRapid Artificial Intelligence Solutions in a Pandemic - The COVID-19-20 Lung CT Lesion Segmentation Challenge.
Roth, Holger; Xu, Ziyue; Diez, Carlos Tor et al

E-print/Working paper (2021)

Artificial intelligence (AI) methods for the automatic detection and quantification of COVID-19 lesions in chest computed tomography (CT) might play an important role in the monitoring and management of ... [more ▼]

Artificial intelligence (AI) methods for the automatic detection and quantification of COVID-19 lesions in chest computed tomography (CT) might play an important role in the monitoring and management of the disease. We organized an international challenge and competition for the development and comparison of AI algorithms for this task, which we supported with public data and state-of-the-art benchmark methods. Board Certified Radiologists annotated 295 public images from two sources (A and B) for algorithms training (n=199, source A), validation (n=50, source A) and testing (n=23, source A; n=23, source B). There were 1,096 registered teams of which 225 and 98 completed the validation and testing phases, respectively. The challenge showed that AI models could be rapidly designed by diverse teams with the potential to measure disease or facilitate timely and patient-specific interventions. This paper provides an overview and the major outcomes of the COVID-19 Lung CT Lesion Segmentation Challenge - 2020. [less ▲]

Detailed reference viewed: 47 (0 UL)
Full Text
See detailOn the Composition and Limitations of Publicly Available COVID-19 X-Ray Imaging Datasets
Garcia Santa Cruz, Beatriz UL; Sölter, Jan UL; Bossa, Matias Nicolas UL et al

E-print/Working paper (2020)

 Machine learning based methods for diagnosis and progression prediction of COVID-19 from imaging data have gained significant attention in the last months, in particular by the use of deep learning ... [more ▼]

 Machine learning based methods for diagnosis and progression prediction of COVID-19 from imaging data have gained significant attention in the last months, in particular by the use of deep learning models. In this context hundreds of models where proposed with the majority of them trained on public datasets. Data scarcity, mismatch between training and target population, group imbalance, and lack of documentation are important sources of bias, hindering the applicability of these models to real-world clinical practice. Considering that datasets are an essential part of model building and evaluation, a deeper understanding of the current landscape is needed. This paper presents an overview of the currently public available COVID-19 chest X-ray datasets. Each dataset is briefly described and potential strength, limitations and interactions between datasets are identified. In particular, some key properties of current datasets that could be potential sources of bias, impairing models trained on them are pointed out. These descriptions are useful for model building on those datasets, to choose the best dataset according the model goal, to take into account the specific limitations to avoid reporting overconfident benchmark results, and to discuss their impact on the generalisation capabilities in a specific clinical setting. [less ▲]

Detailed reference viewed: 355 (10 UL)