![]() Garcia Santa Cruz, Beatriz ![]() ![]() in Scientific Reports (2022) The study of complex diseases relies on large amounts of data to build models toward precision medicine. Such data acquisition is feasible in the context of high-throughput screening, in which the quality ... [more ▼] The study of complex diseases relies on large amounts of data to build models toward precision medicine. Such data acquisition is feasible in the context of high-throughput screening, in which the quality of the results relies on the accuracy of the image analysis. Although state-of-the-art solutions for image segmentation employ deep learning approaches, the high cost of manually generating ground truth labels for model training hampers the day-to-day application in experimental laboratories. Alternatively, traditional computer vision-based solutions do not need expensive labels for their implementation. Our work combines both approaches by training a deep learning network using weak training labels automatically generated with conventional computer vision methods. Our network surpasses the conventional segmentation quality by generalising beyond noisy labels, providing a 25% increase of mean intersection over union, and simultaneously reducing the development and inference times. Our solution was embedded into an easy-to-use graphical user interface that allows researchers to assess the predictions and correct potential inaccuracies with minimal human input. To demonstrate the feasibility of training a deep learning solution on a large dataset of noisy labels automatically generated by a conventional pipeline, we compared our solution against the common approach of training a model from a small manually curated dataset by several experts. Our work suggests that humans perform better in context interpretation, such as error assessment, while computers outperform in pixel-by-pixel fne segmentation. Such pipelines are illustrated with a case study on image segmentation for autophagy events. This work aims for better translation of new technologies to real-world settings in microscopy-image analysis. [less ▲] Detailed reference viewed: 213 (18 UL)![]() Garcia Santa Cruz, Beatriz ![]() ![]() ![]() in Vol. 3 (2022): Proceedings of the Northern Lights Deep Learning Workshop 2022 (2022, April 18) The use of Convolutional Neural Networks (CNN) in medical imaging has often outperformed previous solutions and even specialists, becoming a promising technology for Computer-aided-Diagnosis (CAD) systems ... [more ▼] The use of Convolutional Neural Networks (CNN) in medical imaging has often outperformed previous solutions and even specialists, becoming a promising technology for Computer-aided-Diagnosis (CAD) systems. However, recent works suggested that CNN may have poor generalisation on new data, for instance, generated in different hospitals. Uncontrolled confounders have been proposed as a common reason. In this paper, we experimentally demonstrate the impact of confounding data in unknown scenarios. We assessed the effect of four confounding configurations: total, strong, light and balanced. We found the confounding effect is especially prominent in total confounder scenarios, while the effect on light and strong confounding scenarios may depend on the dataset robustness. Our findings indicate that the confounding effect is independent of the architecture employed. These findings might explain why models can report good metrics during the development stage but fail to translate to real-world settings. We highlight the need for thorough consideration of these commonly unattended aspects, to develop safer CNN-based CAD systems. [less ▲] Detailed reference viewed: 148 (18 UL)![]() Garcia Santa Cruz, Beatriz ![]() ![]() in Bildverarbeitung für die Medizin 2022. Informatik aktuell. Springer Vieweg, Wiesbaden. (2022, April 05) Detailed reference viewed: 41 (3 UL)![]() Ghamizi, Salah ![]() ![]() E-print/Working paper (2022) Clinicians use chest radiography (CXR) to diagnose common pathologies. Automated classification of these diseases can expedite analysis workflow, scale to growing numbers of patients and reduce healthcare ... [more ▼] Clinicians use chest radiography (CXR) to diagnose common pathologies. Automated classification of these diseases can expedite analysis workflow, scale to growing numbers of patients and reduce healthcare costs. While research has produced classification models that perform well on a given dataset, the same models lack generalization on different datasets. This reduces confidence that these models can be reliably deployed across various clinical settings. We propose an approach based on multitask learning to improve model generalization. We demonstrate that learning a (main) pathology together with an auxiliary pathology can significantly impact generalization performance (between -10% and +15% AUC-ROC). A careful choice of auxiliary pathology even yields competitive performance with state-of-the-art models that rely on fine-tuning or ensemble learning, using between 6% and 34% of the training data that these models required. We, further, provide a method to determine what is the best auxiliary task to choose without access to the target dataset. Ultimately, our work makes a big step towards the creation of CXR diagnosis models applicable in the real world, through the evidence that multitask learning can drastically improve generalization. [less ▲] Detailed reference viewed: 134 (17 UL)![]() Garcia Santa Cruz, Beatriz ![]() ![]() ![]() in Medical Image Analysis (2021), 74 Computer-aided diagnosis and stratification of COVID-19 based on chest X-ray suffers from weak bias assessment and limited quality-control. Undetected bias induced by inappropriate use of datasets, and ... [more ▼] Computer-aided diagnosis and stratification of COVID-19 based on chest X-ray suffers from weak bias assessment and limited quality-control. Undetected bias induced by inappropriate use of datasets, and improper consideration of confounders prevents the translation of prediction models into clinical practice. By adopting established tools for model evaluation to the task of evaluating datasets, this study provides a systematic appraisal of publicly available COVID-19 chest X-ray datasets, determining their potential use and evaluating potential sources of bias. Only 9 out of more than a hundred identified datasets met at least the criteria for proper assessment of the risk of bias and could be analysed in detail. Remarkably most of the datasets utilised in 201 papers published in peer-reviewed journals, are not among these 9 datasets, thus leading to models with a high risk of bias. This raises concerns about the suitability of such models for clinical use. This systematic review highlights the limited description of datasets employed for modelling and aids researchers to select the most suitable datasets for their task. [less ▲] Detailed reference viewed: 200 (41 UL)![]() ![]() Garcia Santa Cruz, Beatriz ![]() ![]() ![]() Scientific Conference (2021, November 16) In this paper, we discuss the importance of considering causal relations in the development of machine learning solutions to prevent factors hampering the robustness and generalisation capacity of the ... [more ▼] In this paper, we discuss the importance of considering causal relations in the development of machine learning solutions to prevent factors hampering the robustness and generalisation capacity of the models, such as induced biases. This issue often arises when the algorithm decision is affected by confounding factors. In this work, we argue that the integration of causal relationships can identify potential confounders. We call for standardised meta-information practices as a crucial step for proper machine learning solutions development, validation, and data sharing. Such practices include detailing the dataset generation process, aiming for automatic integration of causal relationships. [less ▲] Detailed reference viewed: 68 (12 UL)![]() ![]() Garcia Santa Cruz, Beatriz ![]() ![]() ![]() Poster (2021, August) Machine learning and data-driven solutions open exciting opportunities in many disciplines including healthcare. The recent transition to this technology into real clinical settings brings new challenges ... [more ▼] Machine learning and data-driven solutions open exciting opportunities in many disciplines including healthcare. The recent transition to this technology into real clinical settings brings new challenges. Such problems derive from several factors, including their dataset origin, composition and description, hampering their fairness and secure application. Considering the potential impact of incorrect predictions in applied-ML healthcare research is urgent. Undetected bias induced by inappropriate use of datasets and improper consideration of confounders prevents the translation of prediction models into clinical practice. Therefore, in this work, the use of available systematic tools to assess the risk of bias in models is employed as the first step to explore robust solutions for better dataset choice, dataset merge and design of the training and validation step during the ML development pipeline. [less ▲] Detailed reference viewed: 127 (17 UL)![]() Garcia Santa Cruz, Beatriz ![]() ![]() ![]() in Movement Disorders (2020, September 12) Objective: Automatize the detection of ‘swallow-tail’ appearance in substantia nigra dopaminergic neurons using MRI for more robust tests on Parkinson’s disease (PD) diagnosis. Background: Differential ... [more ▼] Objective: Automatize the detection of ‘swallow-tail’ appearance in substantia nigra dopaminergic neurons using MRI for more robust tests on Parkinson’s disease (PD) diagnosis. Background: Differential diagnosis of PD is challenging even in specialized centers. The use of imaging techniques can be bene cial for the diagnosis. Although DaTSCAN has been proven to be clinically useful, it is not widely available and has radiation risk and high-cost associated. Therefore, MRI scans for PD diagnosis offer several advantages over DaTSCAN [1]. Recent literature shows strong evidence of high diagnostic accuracy using the ‘swallow-tail’ shape of the dorsolateral substantia nigra in 3T – SWI [2]. Nevertheless, the majority of such studies rely on the subjective opinion of experts and manual methods for the analysis to assess the accuracy of these features. Alternatively, we propose a fully automated solution to evaluate the absence or presence of this feature for computer-aided diagnosis (CAD) of PD. Method: Restrospective study of 27 PD and 18 non-PD was conducted, including standard high-resolution 3D MRI – T1 & SWI sequences (additionally, T2 scans were used to increase the registration references). Firstly, spatial registration and normalization of the images were performed. Then, the ROI was extracted using atlas references. Finally, a supervised machine learning model was built using 5-fold-within-5-fold nested cross-validation. Results: Preliminary results show signi cant sensitivity (0.92) and ROC AUC (0.82), allowing for automated classi cation of patients based on swallow-tail biomarker from MRI. Conclusion: Detection of nigrosome degeneration (swallow-tail biomarker) in accessible brain imaging techniques can be automatized with signi cant accuracy, allowing for computer-aided PD diagnosis. References: [1] Schwarz, S. T., Xing, Y., Naidu, S., Birchall, J., Skelly, R., Perkins, A., ... & Gowland, P. (2017). Protocol of a single group prospective observational study on the diagnostic value of 3T susceptibility weighted MRI of nigrosome-1 in patients with parkinsonian symptoms: the N3iPD study (nigrosomal iron imaging in Parkinson’s disease). BMJ open, 7(12), e016904. [2] – Schwarz, S. T., Afzal, M., Morgan, P. S., Bajaj, N., Gowland, P. A., & Auer, D. P. (2014). The ‘swallow tail’ appearance of the healthy nigrosome –a new accurate test of Parkinson’s disease: a case-control and retrospective cross-sectional MRI study at 3T. PloS one, 9(4). [less ▲] Detailed reference viewed: 204 (22 UL)![]() Garcia Santa Cruz, Beatriz ![]() ![]() ![]() E-print/Working paper (2020) Machine learning based methods for diagnosis and progression prediction of COVID-19 from imaging data have gained significant attention in the last months, in particular by the use of deep learning ... [more ▼] Machine learning based methods for diagnosis and progression prediction of COVID-19 from imaging data have gained significant attention in the last months, in particular by the use of deep learning models. In this context hundreds of models where proposed with the majority of them trained on public datasets. Data scarcity, mismatch between training and target population, group imbalance, and lack of documentation are important sources of bias, hindering the applicability of these models to real-world clinical practice. Considering that datasets are an essential part of model building and evaluation, a deeper understanding of the current landscape is needed. This paper presents an overview of the currently public available COVID-19 chest X-ray datasets. Each dataset is briefly described and potential strength, limitations and interactions between datasets are identified. In particular, some key properties of current datasets that could be potential sources of bias, impairing models trained on them are pointed out. These descriptions are useful for model building on those datasets, to choose the best dataset according the model goal, to take into account the specific limitations to avoid reporting overconfident benchmark results, and to discuss their impact on the generalisation capabilities in a specific clinical setting. [less ▲] Detailed reference viewed: 355 (10 UL)![]() Garcia Santa Cruz, Beatriz ![]() ![]() ![]() Poster (2019, November 29) Automation of biological image analysis is essential to boost biomedical research. The study of complex diseases such as neurodegenerative diseases calls for big amounts of data to build models towards ... [more ▼] Automation of biological image analysis is essential to boost biomedical research. The study of complex diseases such as neurodegenerative diseases calls for big amounts of data to build models towards precision medicine. Such data acquisition is feasible in the context of high-throughput screening in which the quality of the results relays on the accuracy of image analysis. Although the state-of-the-art solutions for image segmentation employ deep learning approaches, the high cost of manual data curation is hampering the real use in current biomedical research laboratories. Here, we propose a pipeline that employs deep learning not only to conduct accurate segmentation but also to assist with the creation of high-quality datasets in a less time-consuming solution for the experts. Weakly-labelled datasets are becoming a common alternative as a starting point to develop real-world solutions. Traditional approaches based on classical multimedia signal processing were employed to generate a pipeline specifically optimized for the high-throughput screening images of iPSC fused with rosella biosensor. Such pipeline produced good segmentation results but with several inaccuracies. We employed the weakly-labelled masks produced in this pipeline to train a multiclass semantic segmentation CNN solution based on U-net architecture. Since a strong class imbalance was detected between the classes, we employed a class sensitive cost function: Dice coe!cient. Next, we evaluated the accuracy between the weakly-labelled data and the trained network segmentation using double-blind tests conducted by experts in cell biology with experience in this type of images; as well as traditional metrics to evaluate the quality of the segmentation using manually curated segmentations by cell biology experts. In all the evaluations the prediction of the neural network overcomes the weakly-labelled data quality segmentation. Another big handicap that complicates the use of deep learning solutions in wet lab environments is the lack of user-friendly tools for non-computational experts such as biologists. To complete our solution, we integrated the trained network on a GUI built on MATLAB environment with non-programming requirements for the user. This integration allows conducting semantic segmentation of microscopy images in a few seconds. In addition, thanks to the patch-based approach it can be employed in images with different sizes. Finally, the human-experts can correct the potential inaccuracies of the prediction in a simple interactive way which can be easily stored and employed to re-train the network to improve its accuracy. In conclusion, our solution focuses on two important bottlenecks to translate leading-edge technologies in computer vision to biomedical research: On one hand, the effortless obtention of high-quality datasets with expertise supervision taking advantage of the proven ability of our CNN solution to generalize from weakly-labelled inaccuracies. On the other hand, the ease of use provided by the GUI integration of our solution to both segment images and interact with the predicted output. Overall this approach looks promising for fast adaptability to new scenarios. [less ▲] Detailed reference viewed: 1132 (44 UL)![]() Garcia Santa Cruz, Beatriz ![]() ![]() ![]() Poster (2019, October 10) Automation of biological image analysis is essential to boost biomedical research. The study of complex diseases such as neurodegenerative diseases calls for big amounts of data to build models towards ... [more ▼] Automation of biological image analysis is essential to boost biomedical research. The study of complex diseases such as neurodegenerative diseases calls for big amounts of data to build models towards precision medicine. Such data acquisition is feasible in the context of high-throughput high-content screening (HTHCS) in which the quality of the results relays on the accuracy of image analysis. Deep learning (DL) yields great performance in image analysis tasks especially with big amounts of data such as the produced in HTHCS contexts. Such DL and HTHCS strength is also their biggest weakness since DL solutions are highly sensitive to bad quality datasets. Hence, accurate Quality Control (QC) for microscopy HTHCS becomes an essential step to obtain reliable pipelines for HTHCS analysis. Usually, artifacts found on these platforms are the consequence of out-of-focus and undesirable density variations. The importance of accurate outlier detection becomes essential for both the training process of generic ML solutions (i.e. segmentation or classification) and the QC of the input data such solution will predict on. Moreover, during the QC of the input dataset, we aim not only to discard unsuitable images but to report the user on the quality of its dataset giving the user the choice to keep or discard the bad images. To build the QC solution we employed fluorescent microscopy images of rosella biosensor generated in the HTHCS platform. A total of 15 planes ranging from -6z to +7z steps to the two optimum planes. We evaluated 27 known focus measure operators and concluded that they have low sensitivity in noisy conditions. We propose a CNN solution which predicts the focus error based on the distance to the optimal plane, outperforming the evaluated focus operators. This QC allows for better results in cell segmentation models based on U-Net architecture as well as promising improvements in image classification tasks. [less ▲] Detailed reference viewed: 195 (30 UL)![]() Garcia Santa Cruz, Beatriz ![]() Bachelor/master dissertation (2019) Detailed reference viewed: 71 (5 UL)![]() Noronha, Alberto ![]() ![]() ![]() in Nucleic Acids Research (2018) A multitude of factors contribute to complex diseases and can be measured with ‘omics’ methods. Databases facilitate data interpretation for underlying mechanisms. Here, we describe the Virtual Metabolic ... [more ▼] A multitude of factors contribute to complex diseases and can be measured with ‘omics’ methods. Databases facilitate data interpretation for underlying mechanisms. Here, we describe the Virtual Metabolic Human (VMH, www.vmh.life) database encapsulating current knowledge of human metabolism within five interlinked resources ‘Human metabolism’, ‘Gut microbiome’, ‘Disease’, ‘Nutrition’, and ‘ReconMaps’. The VMH captures 5180 unique metabolites, 17 730 unique reactions, 3695 human genes, 255 Mendelian diseases, 818 microbes, 632 685 microbial genes and 8790 food items. The VMH’s unique features are (i) the hosting of the metabolic reconstructions of human and gut microbes amenable for metabolic modeling; (ii) seven human metabolic maps for data visualization; (iii) a nutrition designer; (iv) a user-friendly webpage and application-programming interface to access its content; (v) user feedback option for community engagement and (vi) the connection of its entities to 57 other web resources. The VMH represents a novel, interdisciplinary database for data interpretation and hypothesis generation to the biomedical community. [less ▲] Detailed reference viewed: 307 (30 UL) |
||