Reference : GigaSOM.jl: High-performance clustering and visualization of huge cytometry datasets
Scientific journals : Article
Life sciences : Multidisciplinary, general & others
Engineering, computing & technology : Multidisciplinary, general & others
Computational Sciences
http://hdl.handle.net/10993/46296
GigaSOM.jl: High-performance clustering and visualization of huge cytometry datasets
English
Kratochvil, Miroslav* mailto [University of Luxembourg > Luxembourg Centre for Systems Biomedicine (LCSB) > Bioinformatics Core > ; Institute of Organic Chemistry and Biochemistry of the CAS, Prague]
Hunewald, Oliver* mailto [Luxembourg Institute of Health - LIH]
Heirendt, Laurent mailto [University of Luxembourg > Luxembourg Centre for Systems Biomedicine (LCSB) > Bioinformatics Core >]
Verissimo, Vasco [University of Luxembourg > Luxembourg Centre for Systems Biomedicine (LCSB)]
Vondrášek, Jiří mailto [Institute of Organic Chemistry and Biochemistry of the CAS, Prague]
Satagopam, Venkata mailto [University of Luxembourg > Luxembourg Centre for Systems Biomedicine (LCSB) > Bioinformatics Core >]
Schneider, Reinhard mailto [University of Luxembourg > Luxembourg Centre for Systems Biomedicine (LCSB) > Bioinformatics Core >]
Trefois, Christophe mailto [University of Luxembourg > Luxembourg Centre for Systems Biomedicine (LCSB) > Bioinformatics Core >]
Ollert, Markus mailto [Luxembourg Institute of Health - LIH]
* These authors have contributed equally to this work.
18-Nov-2020
GigaScience
Oxford University Press
9
11
Yes
International
2047-217X
United Kingdom
[en] Background: The amount of data generated in large clinical and phenotyping studies that use single-cell cytometry is constantly growing. Recent technological advances allow the easy generation of data with hundreds of millions of single-cell data points with >40 parameters, originating from thousands of individual samples. The analysis of that amount of high-dimensional data becomes demanding in both hardware and software of high-performance computational resources. Current software tools often do not scale to the datasets of such size; users are thus forced to downsample the data to bearable sizes, in turn losing accuracy and ability to detect many underlying complex phenomena.
Results: We present GigaSOM.jl, a fast and scalable implementation of clustering and dimensionality reduction for flow and mass cytometry data. The implementation of GigaSOM.jl in the high-level and high-performance programming language Julia makes it accessible to the scientific community and allows for efficient handling and processing of datasets with billions of data points using distributed computing infrastructures. We describe the design of GigaSOM.jl, measure its performance and horizontal scaling capability, and showcase the functionality on a large dataset from a recent study.
Conclusions: GigaSOM.jl facilitates the use of commonly available high-performance computing resources to process the largest available datasets within minutes, while producing results of the same quality as the current state-of-art software. Measurements indicate that the performance scales to much larger datasets. The example use on the data from a massive mouse phenotyping effort confirms the applicability of GigaSOM.jl to huge-scale studies.
ELIXIR CZ LM2018131 (MEYS) ; FNR AFR-RIKEN bilateral program (TregBar 2015/11228353) ; FNR PRIDE Doctoral Training Unit program (PRIDE/11012546/NEXTIMMUNE) ; Institute of Organic Chemistry and Biochemistry of the CAS (RVO: 61388963) ; ELIXIR Staff Exchange programme 2020
Researchers ; Professionals
http://hdl.handle.net/10993/46296
10.1093/gigascience/giaa127
https://academic.oup.com/gigascience/article/9/11/giaa127/5987271

File(s) associated to this reference

Fulltext file(s):

FileCommentaryVersionSizeAccess
Open access
giaa127.pdfPublisher postprint14.22 MBView/Open

Additional material(s):

File Commentary Size Access
Open access
giaa127_supplemental_file.pdf2.32 MBView/Open

Bookmark and Share SFX Query

All documents in ORBilu are protected by a user license.