[en] Hierarchical clustering is a common tool for simplification, exploration, and analysis of datasets in many areas of research.
For data originating in flow cytometry, a specific variant of agglomerative clustering based Mahalanobis-average linkage has been shown to produce results better than the common linkages.
However, the high complexity of computing the distance limits the applicability of the algorithm to datasets obtained from current equipment.
We propose an optimized, GPU-accelerated open-source implementation of the Mahalanobis-average hierarchical clustering that improves the algorithm performance by over two orders of magnitude, thus allowing it to scale to the large datasets.
We provide a detailed analysis of the optimizations and collected experimental results that are also portable to other hierarchical clustering algorithms; and demonstrate the use on realistic high-dimensional datasets.
Disciplines :
Computer science
Author, co-author :
Šmelko, Adam; Charles University in Prague > Department of Software Engineering
KRATOCHVIL, Miroslav ; University of Luxembourg > Luxembourg Centre for Systems Biomedicine (LCSB) > Bioinformatics Core
Kruliš, Martin; Charles University in Prague > Department of Software Engineering
Sieger, Tomáš; Czech Technical University in Prague > Department of Cybernetic
Czech Science Foundation (GAČR) project 19-22071Y ELIXIR CZ LM2018131 (MEYS) Charles University grant SVV-260451 Czech Health Research Council (AZV) [NV18-08-00385]