![]() ; ; Kratochvil, Miroslav ![]() in Algorithms and Architectures for Parallel Processing (2023, January) Programmers of high-performance applications face many challenging aspects of contemporary hardware architectures. One of the critical aspects is the efficiency of memory operations which is affected not ... [more ▼] Programmers of high-performance applications face many challenging aspects of contemporary hardware architectures. One of the critical aspects is the efficiency of memory operations which is affected not only by the hardware parameters such as memory throughput or cache latency but also by the data-access patterns, which may influence the utilization of the hardware, such as re-usability of the cached data or coalesced data transactions. Therefore, a performance of an algorithm can be highly impacted by the layout of its data structures or the order of data processing which may translate into a more or less optimal sequence of memory operations. These effects are even more pronounced on highly-parallel platforms, such as GPUs, which often employ specific execution models (lock-step) or memory models (shared memory). In this work, we propose a modern, astute approach for managing and implementing memory layouts with first-class structures that is very efficient and straightforward. This approach was implemented in Noarr, a GPU-ready portable C++ library that utilizes generic programming, functional design, and compile-time computations to allow the programmer to specify and compose data structure layouts declaratively while minimizing the indexing and coding overhead. We describe the main principles on code examples and present a performance evaluation that verifies our claims regarding its efficiency. [less ▲] Detailed reference viewed: 25 (3 UL)![]() ; ; Kratochvil, Miroslav ![]() in Metabolic Engineering (2022) Metabolic models are typically characterized by a large number of parameters. Traditionally, metabolic control analysis is applied to differential equation-based models to investigate the sensitivity of ... [more ▼] Metabolic models are typically characterized by a large number of parameters. Traditionally, metabolic control analysis is applied to differential equation-based models to investigate the sensitivity of predictions to parameters. A corresponding theory for constraint-based models is lacking, due to their formulation as optimization problems. Here, we show that optimal solutions of optimization problems can be efficiently differentiated using constrained optimization duality and implicit differentiation. We use this to calculate the sensitivities of predicted reaction fluxes and enzyme concentrations to turnover numbers in an enzyme-constrained metabolic model of Escherichia coli. The sensitivities quantitatively identify rate limiting enzymes and are mathematically precise, unlike current finite difference based approaches used for sensitivity analysis. Further, efficient differentiation of constraint-based models unlocks the ability to use gradient information for parameter estimation. We demonstrate this by improving, genome-wide, the state-of-the-art turnover number estimates for E. coli. Finally, we show that this technique can be generalized to arbitrarily complex models. By differentiating the optimal solution of a model incorporating both thermodynamic and kinetic rate equations, the effect of metabolite concentrations on biomass growth can be elucidated. We benchmark these metabolite sensitivities against a large experimental gene knockdown study, and find good alignment between the predicted sensitivities and in vivo metabolome changes. In sum, we demonstrate several applications of differentiating optimal solutions of constraint-based metabolic models, and show how it connects to classic metabolic control analysis. [less ▲] Detailed reference viewed: 44 (6 UL)![]() Vega Moreno, Carlos Gonzalo ![]() ![]() ![]() in Bioinformatics and Biomedical Engineering (2022) The ever increasing use of artificial intelligence (AI) methods in biomedical sciences calls for closer inter-disciplinary collaborations that transfer the domain knowledge from life scientists to ... [more ▼] The ever increasing use of artificial intelligence (AI) methods in biomedical sciences calls for closer inter-disciplinary collaborations that transfer the domain knowledge from life scientists to computer science researchers and vice-versa. We highlight two general areas where the use of AI-based solutions designed for clinical and laboratory settings has proven problematic. These are used to demonstrate common sources of translational challenges that often stem from the differences in data interpretation between the clinical and research view, and the unmatched expectations and requirements on the result quality metrics. We outline how explicit interpretable inference reporting might be used as a guide to overcome such translational challenges. We conclude with several recommendations for safer translation of machine learning solutions into real-world settings. [less ▲] Detailed reference viewed: 49 (5 UL)![]() ; Kratochvil, Miroslav ![]() in Lecture Notes in Computer Science (2021, August), 12820 Hierarchical clustering is a common tool for simplification, exploration, and analysis of datasets in many areas of research. For data originating in flow cytometry, a specific variant of agglomerative ... [more ▼] Hierarchical clustering is a common tool for simplification, exploration, and analysis of datasets in many areas of research. For data originating in flow cytometry, a specific variant of agglomerative clustering based Mahalanobis-average linkage has been shown to produce results better than the common linkages. However, the high complexity of computing the distance limits the applicability of the algorithm to datasets obtained from current equipment. We propose an optimized, GPU-accelerated open-source implementation of the Mahalanobis-average hierarchical clustering that improves the algorithm performance by over two orders of magnitude, thus allowing it to scale to the large datasets. We provide a detailed analysis of the optimizations and collected experimental results that are also portable to other hierarchical clustering algorithms; and demonstrate the use on realistic high-dimensional datasets. [less ▲] Detailed reference viewed: 56 (1 UL)![]() Kratochvil, Miroslav ![]() ![]() in Bioinformatics (2021) COBREXA.jl is a Julia package for scalable, high-performance constraint-based reconstruction and analysis of very large-scale biological models. Its primary purpose is to facilitate the integration of ... [more ▼] COBREXA.jl is a Julia package for scalable, high-performance constraint-based reconstruction and analysis of very large-scale biological models. Its primary purpose is to facilitate the integration of modern high performance computing environments with the processing and analysis of large-scale metabolic models of challenging complexity. We report the architecture of the package, and demonstrate how the design promotes analysis scalability on several use-cases with multi-organism community models.https://doi.org/10.17881/ZKCR-BT30.Supplementary data are available at Bioinformatics online. [less ▲] Detailed reference viewed: 72 (8 UL)![]() Kratochvil, Miroslav ![]() ![]() in GigaScience (2020), 9(11), Background: The amount of data generated in large clinical and phenotyping studies that use single-cell cytometry is constantly growing. Recent technological advances allow the easy generation of data ... [more ▼] Background: The amount of data generated in large clinical and phenotyping studies that use single-cell cytometry is constantly growing. Recent technological advances allow the easy generation of data with hundreds of millions of single-cell data points with >40 parameters, originating from thousands of individual samples. The analysis of that amount of high-dimensional data becomes demanding in both hardware and software of high-performance computational resources. Current software tools often do not scale to the datasets of such size; users are thus forced to downsample the data to bearable sizes, in turn losing accuracy and ability to detect many underlying complex phenomena. Results: We present GigaSOM.jl, a fast and scalable implementation of clustering and dimensionality reduction for flow and mass cytometry data. The implementation of GigaSOM.jl in the high-level and high-performance programming language Julia makes it accessible to the scientific community and allows for efficient handling and processing of datasets with billions of data points using distributed computing infrastructures. We describe the design of GigaSOM.jl, measure its performance and horizontal scaling capability, and showcase the functionality on a large dataset from a recent study. Conclusions: GigaSOM.jl facilitates the use of commonly available high-performance computing resources to process the largest available datasets within minutes, while producing results of the same quality as the current state-of-art software. Measurements indicate that the performance scales to much larger datasets. The example use on the data from a massive mouse phenotyping effort confirms the applicability of GigaSOM.jl to huge-scale studies. [less ▲] Detailed reference viewed: 95 (8 UL) |
||