Article (Périodiques scientifiques)
GigaSOM.jl: High-performance clustering and visualization of huge cytometry datasets
KRATOCHVIL, Miroslav; Hunewald, Oliver; HEIRENDT, Laurent et al.
2020In GigaScience, 9 (11)
Peer reviewed vérifié par ORBi Dataset
 

Documents


Texte intégral
giaa127.pdf
Postprint Éditeur (14.56 MB)
Télécharger
Annexes
giaa127_supplemental_file.pdf
(2.38 MB)
Télécharger

Tous les documents dans ORBilu sont protégés par une licence d'utilisation.

Envoyer vers



Détails



Résumé :
[en] Background: The amount of data generated in large clinical and phenotyping studies that use single-cell cytometry is constantly growing. Recent technological advances allow the easy generation of data with hundreds of millions of single-cell data points with >40 parameters, originating from thousands of individual samples. The analysis of that amount of high-dimensional data becomes demanding in both hardware and software of high-performance computational resources. Current software tools often do not scale to the datasets of such size; users are thus forced to downsample the data to bearable sizes, in turn losing accuracy and ability to detect many underlying complex phenomena. Results: We present GigaSOM.jl, a fast and scalable implementation of clustering and dimensionality reduction for flow and mass cytometry data. The implementation of GigaSOM.jl in the high-level and high-performance programming language Julia makes it accessible to the scientific community and allows for efficient handling and processing of datasets with billions of data points using distributed computing infrastructures. We describe the design of GigaSOM.jl, measure its performance and horizontal scaling capability, and showcase the functionality on a large dataset from a recent study. Conclusions: GigaSOM.jl facilitates the use of commonly available high-performance computing resources to process the largest available datasets within minutes, while producing results of the same quality as the current state-of-art software. Measurements indicate that the performance scales to much larger datasets. The example use on the data from a massive mouse phenotyping effort confirms the applicability of GigaSOM.jl to huge-scale studies.
Centre de recherche :
ULHPC - University of Luxembourg: High Performance Computing
- Luxembourg Centre for Systems Biomedicine (LCSB): Bioinformatics Core (R. Schneider Group)
Disciplines :
Sciences du vivant: Multidisciplinaire, généralités & autres
Ingénierie, informatique & technologie: Multidisciplinaire, généralités & autres
Auteur, co-auteur :
KRATOCHVIL, Miroslav   ;  University of Luxembourg > Luxembourg Centre for Systems Biomedicine (LCSB) > Bioinformatics Core ; Institute of Organic Chemistry and Biochemistry of the CAS, Prague
Hunewald, Oliver ;  Luxembourg Institute of Health - LIH
HEIRENDT, Laurent  ;  University of Luxembourg > Luxembourg Centre for Systems Biomedicine (LCSB) > Bioinformatics Core
Verissimo, Vasco;  University of Luxembourg > Luxembourg Centre for Systems Biomedicine (LCSB)
Vondrášek, Jiří;  Institute of Organic Chemistry and Biochemistry of the CAS, Prague
SATAGOPAM, Venkata ;  University of Luxembourg > Luxembourg Centre for Systems Biomedicine (LCSB) > Bioinformatics Core
SCHNEIDER, Reinhard ;  University of Luxembourg > Luxembourg Centre for Systems Biomedicine (LCSB) > Bioinformatics Core
TREFOIS, Christophe ;  University of Luxembourg > Luxembourg Centre for Systems Biomedicine (LCSB) > Bioinformatics Core
Ollert, Markus;  Luxembourg Institute of Health - LIH
 Ces auteurs ont contribué de façon équivalente à la publication.
Co-auteurs externes :
yes
Langue du document :
Anglais
Titre :
GigaSOM.jl: High-performance clustering and visualization of huge cytometry datasets
Date de publication/diffusion :
18 novembre 2020
Titre du périodique :
GigaScience
ISSN :
2047-217X
Maison d'édition :
Oxford University Press, Royaume-Uni
Volume/Tome :
9
Fascicule/Saison :
11
Peer reviewed :
Peer reviewed vérifié par ORBi
Focus Area :
Computational Sciences
Organisme subsidiant :
ELIXIR CZ LM2018131 (MEYS)
FNR AFR-RIKEN bilateral program (TregBar 2015/11228353)
FNR PRIDE Doctoral Training Unit program (PRIDE/11012546/NEXTIMMUNE)
Institute of Organic Chemistry and Biochemistry of the CAS (RVO: 61388963)
ELIXIR Staff Exchange programme 2020
Jeu de données :
Disponible sur ORBilu :
depuis le 18 février 2021

Statistiques


Nombre de vues
250 (dont 28 Unilu)
Nombre de téléchargements
153 (dont 4 Unilu)

citations Scopus®
 
10
citations Scopus®
sans auto-citations
6
OpenCitations
 
7
citations OpenAlex
 
14
citations WoS
 
11

Bibliographie


Publications similaires



Contacter ORBilu