References of "GigaScience"
     in
Bookmark and Share    
Full Text
Peer Reviewed
See detailMantis: flexible and consensus-driven genome annotation
Queirós, Pedro; Delogu, Francesco UL; Hickl, Oskar UL et al

in GigaScience (2021), 10(6),

The rapid development of the (meta-)omics fields has produced an unprecedented amount of high-resolution and high-fidelity data. Through the use of these datasets we can infer the role of previously ... [more ▼]

The rapid development of the (meta-)omics fields has produced an unprecedented amount of high-resolution and high-fidelity data. Through the use of these datasets we can infer the role of previously functionally unannotated proteins from single organisms and consortia. In this context, protein function annotation can be described as the identification of regions of interest (i.e., domains) in protein sequences and the assignment of biological functions. Despite the existence of numerous tools, challenges remain in terms of speed, flexibility, and reproducibility. In the big data era, it is also increasingly important to cease limiting our findings to a single reference, coalescing knowledge from different data sources, and thus overcoming some limitations in overly relying on computationally generated data from single sources.We implemented a protein annotation tool, Mantis, which uses database identifiers intersection and text mining to integrate knowledge from multiple reference data sources into a single consensus-driven output. Mantis is flexible, allowing for the customization of reference data and execution parameters, and is reproducible across different research goals and user environments. We implemented a depth-first search algorithm for domain-specific annotation, which significantly improved annotation performance compared to sequence-wide annotation. The parallelized implementation of Mantis results in short runtimes while also outputting high coverage and high-quality protein function annotations.Mantis is a protein function annotation tool that produces high-quality consensus-driven protein annotations. It is easy to set up, customize, and use, scaling from single genomes to large metagenomes. Mantis is available under the MIT license at https://github.com/PedroMTQ/mantis. [less ▲]

Detailed reference viewed: 39 (2 UL)
Full Text
Peer Reviewed
See detailGigaSOM.jl: High-performance clustering and visualization of huge cytometry datasets
Kratochvil, Miroslav UL; Hunewald, Oliver; Heirendt, Laurent UL et al

in GigaScience (2020), 9(11),

Background: The amount of data generated in large clinical and phenotyping studies that use single-cell cytometry is constantly growing. Recent technological advances allow the easy generation of data ... [more ▼]

Background: The amount of data generated in large clinical and phenotyping studies that use single-cell cytometry is constantly growing. Recent technological advances allow the easy generation of data with hundreds of millions of single-cell data points with >40 parameters, originating from thousands of individual samples. The analysis of that amount of high-dimensional data becomes demanding in both hardware and software of high-performance computational resources. Current software tools often do not scale to the datasets of such size; users are thus forced to downsample the data to bearable sizes, in turn losing accuracy and ability to detect many underlying complex phenomena. Results: We present GigaSOM.jl, a fast and scalable implementation of clustering and dimensionality reduction for flow and mass cytometry data. The implementation of GigaSOM.jl in the high-level and high-performance programming language Julia makes it accessible to the scientific community and allows for efficient handling and processing of datasets with billions of data points using distributed computing infrastructures. We describe the design of GigaSOM.jl, measure its performance and horizontal scaling capability, and showcase the functionality on a large dataset from a recent study. Conclusions: GigaSOM.jl facilitates the use of commonly available high-performance computing resources to process the largest available datasets within minutes, while producing results of the same quality as the current state-of-art software. Measurements indicate that the performance scales to much larger datasets. The example use on the data from a massive mouse phenotyping effort confirms the applicability of GigaSOM.jl to huge-scale studies. [less ▲]

Detailed reference viewed: 48 (4 UL)
Full Text
Peer Reviewed
See detailDAISY: A Data Information System for accountability under the General Data Protection Regulation
Becker, Regina UL; Alper, Pinar UL; Groues, Valentin UL et al

in GigaScience (2019), 8(12),

The new European legislation on data protection, namely, the General Data Protection Regulation (GDPR), has introduced comprehensive requirements for the documentation about the processing of personal ... [more ▼]

The new European legislation on data protection, namely, the General Data Protection Regulation (GDPR), has introduced comprehensive requirements for the documentation about the processing of personal data as well as informing the data subjects of its use. GDPR’s accountability principle requires institutions, projects, and data hubs to document their data processings and demonstrate compliance with the GDPR. In response to this requirement, we see the emergence of commercial data-mapping tools, and institutions creating GDPR data register with such tools. One shortcoming of this approach is the genericity of tools, and their process-based model not capturing the project-based, collaborative nature of data processing in biomedical research.We have developed a software tool to allow research institutions to comply with the GDPR accountability requirement and map the sometimes very complex data flows in biomedical research. By analysing the transparency and record-keeping obligations of each GDPR principle, we observe that our tool effectively meets the accountability requirement.The GDPR is bringing data protection to center stage in research data management, necessitating dedicated tools, personnel, and processes. Our tool, DAISY, is tailored specifically for biomedical research and can help institutions in tackling the documentation challenge brought about by the GDPR. DAISY is made available as a free and open source tool on Github. DAISY is actively being used at the Luxembourg Centre for Systems Biomedicine and the ELIXIR-Luxembourg data hub. [less ▲]

Detailed reference viewed: 177 (16 UL)
Full Text
Peer Reviewed
See detailBSA4Yeast: Web-based quantitative trait locus linkage analysis and bulk segregant analysis of yeast sequencing data
Zhang, Zhi; Jung, Paul; Groues, Valentin UL et al

in GigaScience (2019), 8(6), 060

Quantitative Trait Loci (QTL) mapping using bulk segregants is an effective approach for identifying genetic variants associated with phenotypes of interest in model organisms. By exploiting next ... [more ▼]

Quantitative Trait Loci (QTL) mapping using bulk segregants is an effective approach for identifying genetic variants associated with phenotypes of interest in model organisms. By exploiting next-generation sequencing technology, the QTL mapping accuracy can be improved significantly, providing a valuable means to annotate new genetic variants. However, setting up a comprehensive analysis framework for this purpose is a time-consuming and error prone task, posing many challenges for scientists with limited experience in this domain. Findings: Here, we present BSA4Yeast, a comprehensive web-application for QTL mapping via bulk segregant analysis of yeast sequencing data. The software provides an automated and efficiency-optimized data processing, up-to-date functional annotations, and an interactive web-interface to explore identified QTLs. Conclusion: BSA4Yeast enables researchers to identify plausible candidate genes in QTL regions efficiently in order to validate their genetic variations experimentally as causative for a phenotype of interest. BSA4Yeast is freely available at https://bsa4yeast.lcsb.uni.lu. [less ▲]

Detailed reference viewed: 242 (22 UL)
Full Text
Peer Reviewed
See detailFractalis: A scalable open-source service for platform-independent interactive visual analysis of biomedical data
Herzinger, Sascha UL; Groues, Valentin UL; Gu, Wei UL et al

in GigaScience (2018)

Background: Translational research platforms share the aim to promote a deeper understanding of stored data by providing visualization and analysis tools for data exploration and hypothesis generation ... [more ▼]

Background: Translational research platforms share the aim to promote a deeper understanding of stored data by providing visualization and analysis tools for data exploration and hypothesis generation. However, such tools are usually platform-bound and are not easily reusable by other systems. Furthermore, they rarely address access restriction issues when direct data transfer is not permitted. In this article we present an analytical service that works in tandem with a visualization library to address these problems. Findings: Using a combination of existing technologies and a platform-specific data abstraction layer we developed a service that is capable of providing existing web-based data warehouses and repositories with platform-independent visual analytical capabilities. The design of this service also allows for federated data analysis by eliminating the need to move the data directly to the researcher. Instead, all operations are based on statistics and interactive charts without direct access to the dataset. Conclusion: The software presented in this article has a potential to help translational researchers achieve a better understanding of a given dataset and quickly generate new hypothesis. Furthermore, it provides a framework that can be used to share and reuse explorative analysis tools within the community. [less ▲]

Detailed reference viewed: 246 (27 UL)