![]() Gu, Wei ![]() in BMC Bioinformatics (2019), 20(1), 164 For large international research consortia, such as those funded by the European Union’s Horizon 2020 programme or the Innovative Medicines Initiative, good data coordination practices and tools are ... [more ▼] For large international research consortia, such as those funded by the European Union’s Horizon 2020 programme or the Innovative Medicines Initiative, good data coordination practices and tools are essential for the successful collection, organization and analysis of the resulting data. Research consortia are attempting ever more ambitious science to better understand disease, by leveraging technologies such as whole genome sequencing, proteomics, patient-derived biological models and computer-based systems biology simulations. [less ▲] Detailed reference viewed: 249 (13 UL)![]() Ostaszewski, Marek ![]() ![]() in BMC bioinformatics (2018), 19(1), 308 BACKGROUND: Biomedical knowledge grows in complexity, and becomes encoded in network-based repositories, which include focused, expert-drawn diagrams, networks of evidence-based associations and ... [more ▼] BACKGROUND: Biomedical knowledge grows in complexity, and becomes encoded in network-based repositories, which include focused, expert-drawn diagrams, networks of evidence-based associations and established ontologies. Combining these structured information sources is an important computational challenge, as large graphs are difficult to analyze visually. RESULTS: We investigate knowledge discovery in manually curated and annotated molecular interaction diagrams. To evaluate similarity of content we use: i) Euclidean distance in expert-drawn diagrams, ii) shortest path distance using the underlying network and iii) ontology-based distance. We employ clustering with these metrics used separately and in pairwise combinations. We propose a novel bi-level optimization approach together with an evolutionary algorithm for informative combination of distance metrics. We compare the enrichment of the obtained clusters between the solutions and with expert knowledge. We calculate the number of Gene and Disease Ontology terms discovered by different solutions as a measure of cluster quality. Our results show that combining distance metrics can improve clustering accuracy, based on the comparison with expert-provided clusters. Also, the performance of specific combinations of distance functions depends on the clustering depth (number of clusters). By employing bi-level optimization approach we evaluated relative importance of distance functions and we found that indeed the order by which they are combined affects clustering performance. Next, with the enrichment analysis of clustering results we found that both hierarchical and bi-level clustering schemes discovered more Gene and Disease Ontology terms than expert-provided clusters for the same knowledge repository. Moreover, bi-level clustering found more enriched terms than the best hierarchical clustering solution for three distinct distance metric combinations in three different instances of disease maps. CONCLUSIONS: In this work we examined the impact of different distance functions on clustering of a visual biomedical knowledge repository. We found that combining distance functions may be beneficial for clustering, and improve exploration of such repositories. We proposed bi-level optimization to evaluate the importance of order by which the distance functions are combined. Both combination and order of these functions affected clustering quality and knowledge recognition in the considered benchmarks. We propose that multiple dimensions can be utilized simultaneously for visual knowledge exploration. [less ▲] Detailed reference viewed: 231 (29 UL)![]() Albrecht, Marco ![]() in BMC Bioinformatics (2017), 18(1), 33 Background: The analysis of microarray time series promises a deeper insight into the dynamics of the cellular response following stimulation. A common observation in this type of data is that some genes ... [more ▼] Background: The analysis of microarray time series promises a deeper insight into the dynamics of the cellular response following stimulation. A common observation in this type of data is that some genes respond with quick, transient dynamics, while other genes change their expression slowly over time. The existing methods for detecting significant expression dynamics often fail when the expression dynamics show a large heterogeneity. Moreover, these methods often cannot cope with irregular and sparse measurements. Results: The method proposed here is specifically designed for the analysis of perturbation responses. It combines different scores to capture fast and transient dynamics as well as slow expression changes, and performs well in the presence of low replicate numbers and irregular sampling times. The results are given in the form of tables including links to figures showing the expression dynamics of the respective transcript. These allow to quickly recognise the relevance of detection, to identify possible false positives and to discriminate early and late changes in gene expression. An extension of the method allows the analysis of the expression dynamics of functional groups of genes, providing a quick overview of the cellular response. The performance of this package was tested on microarray data derived from lung cancer cells stimulated with epidermal growth factor (EGF). Conclusion: Here we describe a new, efficient method for the analysis of sparse and heterogeneous time course data with high detection sensitivity and transparency. It is implemented as R package TTCA (transcript time course analysis) and can be installed from the Comprehensive R Archive Network, CRAN. The source code is provided with the Additional file 1. [less ▲] Detailed reference viewed: 184 (22 UL)![]() ; ; et al in BMC bioinformatics (2017), 18(1), 233 BACKGROUND: Recent advances in high-throughput sequencing allow for much deeper exploitation of natural and engineered microbial communities, and to unravel so-called "microbial dark matter" (microbes ... [more ▼] BACKGROUND: Recent advances in high-throughput sequencing allow for much deeper exploitation of natural and engineered microbial communities, and to unravel so-called "microbial dark matter" (microbes that until now have evaded cultivation). Metagenomic analyses result in a large number of genomic fragments (contigs) that need to be grouped (binned) in order to reconstruct draft microbial genomes. While several contig binning algorithms have been developed in the past 2 years, they often lack consensus. Furthermore, these software tools typically lack a provision for the visualization of data and bin characteristics. RESULTS: We present ICoVeR, the Interactive Contig-bin Verification and Refinement tool, which allows the visualization of genome bins. More specifically, ICoVeR allows curation of bin assignments based on multiple binning algorithms. Its visualization window is composed of two connected and interactive main views, including a parallel coordinates view and a dimensionality reduction plot. To demonstrate ICoVeR's utility, we used it to refine disparate genome bins automatically generated using MetaBAT, CONCOCT and MyCC for an anaerobic digestion metagenomic (AD microbiome) dataset. Out of 31 refined genome bins, 23 were characterized with higher completeness and lower contamination in comparison to their respective, automatically generated, genome bins. Additionally, to benchmark ICoVeR against a previously validated dataset, we used Sharon's dataset representing an infant gut metagenome. CONCLUSIONS: ICoVeR is an open source software package that allows curation of disparate genome bins generated with automatic binning algorithms. It is freely available under the GPLv3 license at https://git.list.lu/eScience/ICoVeR . The data management and analytical functions of ICoVeR are implemented in R, therefore the software can be easily installed on any system for which R is available. Installation and usage guide together with the example files ready to be visualized are also provided via the project wiki. ICoVeR running instance preloaded with AD microbiome and Sharon's datasets can be accessed via the website. [less ▲] Detailed reference viewed: 169 (1 UL)![]() Rauschenberger, Armin ![]() in BMC Bioinformatics (2016), 17 Background: Testing for association between RNA-Seq and other genomic data is challenging due to high variability of the former and high dimensionality of the latter. Results: Using the negative binomial ... [more ▼] Background: Testing for association between RNA-Seq and other genomic data is challenging due to high variability of the former and high dimensionality of the latter. Results: Using the negative binomial distribution and a random-effects model, we develop an omnibus test that overcomes both difficulties. It may be conceptualised as a test of overall significance in regression analysis, where the response variable is overdispersed and the number of explanatory variables exceeds the sample size. Conclusions: The proposed test can detect genetic and epigenetic alterations that affect gene expression. It can examine complex regulatory mechanisms of gene expression. The R package globalSeq is available from Bioconductor. [less ▲] Detailed reference viewed: 92 (5 UL)![]() Noronha, Alberto ![]() in BMC Bioinformatics (2014), 15 Background Over the last years, several methods for the phenotype simulation of microorganisms, underspecified genetic and environmental conditions have been proposed, in the context of Metabolic ... [more ▼] Background Over the last years, several methods for the phenotype simulation of microorganisms, underspecified genetic and environmental conditions have been proposed, in the context of Metabolic Engineering (ME). These methods provided insight on the functioning of microbial metabolism and played a key role in the design of genetic modifications that can lead to strains of industrial interest. On the other hand, in the context of Systems Biology research, biological network visualization has reinforced its role as a core tool in understanding biological processes. However, it has been scarcely used to foster ME related methods, in spite of the acknowledged potential. Results In this work, an open-source software that aims to fill the gap between ME and metabolic network visualization is proposed, in the form of a plugin to the OptFlux ME platform. The framework is based on an abstract layer, where the network is represented as a bipartite graph containing minimal information about the underlying entities and their desired relative placement. The framework provides input/output support for networks specified in standard formats, such as XGMML, SBGN or SBML, providing a connection to genome-scale metabolic models. An user-interface makes it possible to edit, manipulate and query nodes in the network, providing tools to visualize diverse effects, including visual filters and aspect changing (e.g. colors, shapes and sizes). These tools are particularly interesting for ME, since they allow overlaying phenotype simulation results or elementary flux modes over the networks. Conclusions The framework and its source code are freely available, together with documentation and other resources, being illustrated with well documented case studies. [less ▲] Detailed reference viewed: 156 (3 UL)![]() Killcoyne, Sarah ![]() ![]() in BMC Bioinformatics (2014), 15(1), Detailed reference viewed: 188 (13 UL)![]() ; Fleming, Ronan MT ![]() ![]() in BMC Bioinformatics (2013), 1(14), 240 Background:Biological processes such as metabolism, signaling, and macromolecular synthesis can be modeled as large networks of biochemical reactions. Large and comprehensive networks, like integrated ... [more ▼] Background:Biological processes such as metabolism, signaling, and macromolecular synthesis can be modeled as large networks of biochemical reactions. Large and comprehensive networks, like integrated networks that represent metabolism and macromolecular synthesis, are inherently multiscale because reaction rates can vary over many orders of magnitude. They require special methods for accurate analysis because naive use of standard optimization systems can produce inaccurate or erroneously infeasible results. Results: We describe techniques enabling off-the-shelf optimization software to compute accurate solutions to the poorly scaled optimization problems arising from flux balance analysis of multiscale biochemical reaction networks. We implement lifting techniques for flux balance analysis within the openCOBRA toolbox and demonstrate our techniques using the first integrated reconstruction of metabolism and macromolecular synthesis for E. coli. Conclusion:Our techniques enable accurate flux balance analysis of multiscale networks using off-the-shelf optimization software. Although we describe lifting techniques in the context of flux balance analysis, our methods can be used to handle a variety of optimization problems arising from analysis of multiscale network reconstructions. [less ▲] Detailed reference viewed: 234 (8 UL)![]() ; ; Killcoyne, Sarah ![]() in BMC Bioinformatics (2012), 13 BACKGROUND: For shotgun mass spectrometry based proteomics the most computationally expensive step is in matching the spectra against an increasingly large database of sequences and their post ... [more ▼] BACKGROUND: For shotgun mass spectrometry based proteomics the most computationally expensive step is in matching the spectra against an increasingly large database of sequences and their post-translational modifications with known masses. Each mass spectrometer can generate data at an astonishingly high rate, and the scope of what is searched for is continually increasing. Therefore solutions for improving our ability to perform these searches are needed. RESULTS: We present a sequence database search engine that is specifically designed to run efficiently on the Hadoop MapReduce distributed computing framework. The search engine implements the K-score algorithm, generating comparable output for the same input files as the original implementation. The scalability of the system is shown, and the architecture required for the development of such distributed processing is discussed. CONCLUSION: The software is scalable in its ability to handle a large peptide database, numerous modifications and large numbers of spectra. Performance scales with the number of processors in the cluster, allowing throughput to expand with the available resources. [less ▲] Detailed reference viewed: 179 (9 UL)![]() ; ; et al in BMC Bioinformatics (2012), (13), 45 Elucidating the genotype-phenotype connection is one of the big challenges of modern molecular biology. To fully understand this connection, it is necessary to consider the underlying networks and the ... [more ▼] Elucidating the genotype-phenotype connection is one of the big challenges of modern molecular biology. To fully understand this connection, it is necessary to consider the underlying networks and the time factor. In this context of data deluge and heterogeneous information, visualization plays an essential role in interpreting complex and dynamic topologies. Thus, software that is able to bring the network, phenotypic and temporal information together is needed. Arena3D has been previously introduced as a tool that facilitates link discovery between processes. It uses a layered display to separate different levels of information while emphasizing the connections between them. We present novel developments of the tool for the visualization and analysis of dynamic genotype-phenotype landscapes. [less ▲] Detailed reference viewed: 143 (2 UL)![]() ; Hussong, René ![]() in BMC Bioinformatics (2012), 13(291), Background The robust identification of isotope patterns originating from peptides being analyzed through mass spectrometry (MS) is often significantly hampered by noise artifacts and the interference of ... [more ▼] Background The robust identification of isotope patterns originating from peptides being analyzed through mass spectrometry (MS) is often significantly hampered by noise artifacts and the interference of overlapping patterns arising e.g. from post-translational modifications. As the classification of the recorded data points into either ‘noise’ or ‘signal’ lies at the very root of essentially every proteomic application, the quality of the automated processing of mass spectra can significantly influence the way the data might be interpreted within a given biological context. Results We propose non-negative least squares/non-negative least absolute deviation regression to fit a raw spectrum by templates imitating isotope patterns. In a carefully designed validation scheme, we show that the method exhibits excellent performance in pattern picking. It is demonstrated that the method is able to disentangle complicated overlaps of patterns. Conclusions We find that regularization is not necessary to prevent overfitting and that thresholding is an effective and user-friendly way to perform feature selection. The proposed method avoids problems inherent in regularization-based approaches, comes with a set of well-interpretable parameters whose default configuration is shown to generalize well without the need for fine-tuning, and is applicable to spectra of different platforms. The R package IPPD implements the method and is available from the Bioconductor platform (http://bioconductor.fhcrc.org/help/bioc-views/devel/bioc/html/IPPD.html) [less ▲] Detailed reference viewed: 106 (0 UL)![]() ; ; Krause, Roland ![]() in BMC Bioinformatics (2012), 13 Suppl 10 BACKGROUND: Biological networks provide fundamental insights into the functional characterization of genes and their products, the characterization of DNA-protein interactions, the identification of ... [more ▼] BACKGROUND: Biological networks provide fundamental insights into the functional characterization of genes and their products, the characterization of DNA-protein interactions, the identification of regulatory mechanisms, and other biological tasks. Due to the experimental and biological complexity, their computational exploitation faces many algorithmic challenges. RESULTS: We introduce novel weighted quasi-biclique problems to identify functional modules in biological networks when represented by bipartite graphs. In difference to previous quasi-biclique problems, we include biological interaction levels by using edge-weighted quasi-bicliques. While we prove that our problems are NP-hard, we also describe IP formulations to compute exact solutions for moderately sized networks. CONCLUSIONS: We verify the effectiveness of our IP solutions using both simulation and empirical data. The simulation shows high quasi-biclique recall rates, and the empirical data corroborate the abilities of our weighted quasi-bicliques in extracting features and recovering missing interactions from biological networks. [less ▲] Detailed reference viewed: 186 (2 UL)![]() Barbosa Da Silva, Adriano ![]() in BMC bioinformatics (2011), 12 BACKGROUND: Biological function is greatly dependent on the interactions of proteins with other proteins and genes. Abstracts from the biomedical literature stored in the NCBI's PubMed database can be ... [more ▼] BACKGROUND: Biological function is greatly dependent on the interactions of proteins with other proteins and genes. Abstracts from the biomedical literature stored in the NCBI's PubMed database can be used for the derivation of interactions between genes and proteins by identifying the co-occurrences of their terms. Often, the amount of interactions obtained through such an approach is large and may mix processes occurring in different contexts. Current tools do not allow studying these data with a focus on concepts of relevance to a user, for example, interactions related to a disease or to a biological mechanism such as protein aggregation. RESULTS: To help the concept-oriented exploration of such data we developed PESCADOR, a web tool that extracts a network of interactions from a set of PubMed abstracts given by a user, and allows filtering the interaction network according to user-defined concepts. We illustrate its use in exploring protein aggregation in neurodegenerative disease and in the expansion of pathways associated to colon cancer. CONCLUSIONS: PESCADOR is a platform independent web resource available at: http://cbdm.mdc-berlin.de/tools/pescador/ [less ▲] Detailed reference viewed: 149 (2 UL)![]() Barbosa Da Silva, Adriano ![]() in BMC Bioinformatics (2010), 11 BACKGROUND: Biological knowledge is represented in scientific literature that often describes the function of genes/proteins (bioentities) in terms of their interactions (biointeractions). Such ... [more ▼] BACKGROUND: Biological knowledge is represented in scientific literature that often describes the function of genes/proteins (bioentities) in terms of their interactions (biointeractions). Such bioentities are often related to biological concepts of interest that are specific of a determined research field. Therefore, the study of the current literature about a selected topic deposited in public databases, facilitates the generation of novel hypotheses associating a set of bioentities to a common context. RESULTS: We created a text mining system (LAITOR: Literature Assistant for Identification of Terms co-Occurrences and Relationships) that analyses co-occurrences of bioentities, biointeractions, and other biological terms in MEDLINE abstracts. The method accounts for the position of the co-occurring terms within sentences or abstracts. The system detected abstracts mentioning protein-protein interactions in a standard test (BioCreative II IAS test data) with a precision of 0.82-0.89 and a recall of 0.48-0.70. We illustrate the application of LAITOR to the detection of plant response genes in a dataset of 1000 abstracts relevant to the topic. CONCLUSIONS: Text mining tools combining the extraction of interacting bioentities and biological concepts with network displays can be helpful in developing reasonable hypotheses in different scientific backgrounds. [less ▲] Detailed reference viewed: 265 (7 UL)![]() Roomp, Kirsten ![]() in BMC Bioinformatics (2010), 11(90), 1-2 BACKGROUND: Experimental screening of large sets of peptides with respect to their MHC binding capabilities is still very demanding due to the large number of possible peptide sequences and the extensive ... [more ▼] BACKGROUND: Experimental screening of large sets of peptides with respect to their MHC binding capabilities is still very demanding due to the large number of possible peptide sequences and the extensive polymorphism of the MHC proteins. Therefore, there is significant interest in the development of computational methods for predicting the binding capability of peptides to MHC molecules, as a first step towards selecting peptides for actual screening. RESULTS: We have examined the performance of four diverse MHC Class I prediction methods on comparatively large HLA-A and HLA-B allele peptide binding datasets extracted from the Immune Epitope Database and Analysis resource (IEDB). The chosen methods span a representative cross-section of available methodology for MHC binding predictions. Until the development of IEDB, such an analysis was not possible, as the available peptide sequence datasets were small and spread out over many separate efforts. We tested three datasets which differ in the IC50 cutoff criteria used to select the binders and non-binders. The best performance was achieved when predictions were performed on the dataset consisting only of strong binders (IC50 less than 10 nM) and clear non-binders (IC50 greater than 10,000 nM). In addition, robustness of the predictions was only achieved for alleles that were represented with a sufficiently large (greater than 200), balanced set of binders and non-binders. CONCLUSIONS: All four methods show good to excellent performance on the comprehensive datasets, with the artificial neural networks based method outperforming the other methods. However, all methods show pronounced difficulties in correctly categorizing intermediate binders. [less ▲] Detailed reference viewed: 97 (0 UL)![]() Glaab, Enrico ![]() in BMC Bioinformatics (2010), 11(1), 597-597 Detailed reference viewed: 124 (2 UL)![]() ; Thiele, Ines ![]() in BMC Bioinformatics (2010), 11 BACKGROUND: Flux variability analysis is often used to determine robustness of metabolic models in various simulation conditions. However, its use has been somehow limited by the long computation time ... [more ▼] BACKGROUND: Flux variability analysis is often used to determine robustness of metabolic models in various simulation conditions. However, its use has been somehow limited by the long computation time compared to other constraint-based modeling methods. RESULTS: We present an open source implementation of flux variability analysis called fastFVA. This efficient implementation makes large-scale flux variability analysis feasible and tractable allowing more complex biological questions regarding network flexibility and robustness to be addressed. CONCLUSIONS: Networks involving thousands of biochemical reactions can be analyzed within seconds, greatly expanding the utility of flux variability analysis in systems biology. [less ▲] Detailed reference viewed: 136 (4 UL)![]() ![]() ; Nguyen, Thanh-Phuong ![]() in BMC Bioinformatics (2009), 10(1), 370 Detailed reference viewed: 85 (1 UL)![]() Glaab, Enrico ![]() in BMC Bioinformatics (2009), 10(1), 358-358 Background: Statistical analysis of DNA microarray data provides a valuable diagnostic tool for the investigation of genetic components of diseases. To take advantage of the multitude of available data ... [more ▼] Background: Statistical analysis of DNA microarray data provides a valuable diagnostic tool for the investigation of genetic components of diseases. To take advantage of the multitude of available data sets and analysis methods, it is desirable to combine both different algorithms and data from different studies. Applying ensemble learning, consensus clustering and cross-study normalization methods for this purpose in an almost fully automated process and linking different analysis modules together under a single interface would simplify many microarray analysis tasks. Results: We present ArrayMining.net, a web-application for microarray analysis that provides easy access to a wide choice of feature selection, clustering, prediction, gene set analysis and cross-study normalization methods. In contrast to other microarray-related web-tools, multiple algorithms and data sets for an analysis task can be combined using ensemble feature selection, ensemble prediction, consensus clustering and cross-platform data integration. By interlinking different analysis tools in a modular fashion, new exploratory routes become available, e.g. ensemble sample classification using features obtained from a gene set analysis and data from multiple studies. The analysis is further simplified by automatic parameter selection mechanisms and linkage to web tools and databases for functional annotation and literature mining. Conclusion: ArrayMining.net is a free web-application for microarray analysis combining a broad choice of algorithms based on ensemble and consensus methods, using automatic parameter selection and integration with annotation databases. [less ▲] Detailed reference viewed: 189 (8 UL)![]() ; ; Schneider, Reinhard ![]() in BMC Bioinformatics (2009), 10 Background: During the last years, high throughput experimental methods have been developed which generate large datasets of protein - protein interactions (PPIs). However, due to the experimental ... [more ▼] Background: During the last years, high throughput experimental methods have been developed which generate large datasets of protein - protein interactions (PPIs). However, due to the experimental methodologies these datasets contain errors mainly in terms of false positive data sets and reducing therefore the quality of any derived information. Typically these datasets can be modeled as graphs, where vertices represent proteins and edges the pairwise PPIs, making it easy to apply automated clustering methods to detect protein complexes or other biological significant functional groupings. Methods: In this paper, a clustering tool, called GIBA (named by the first characters of its developers' nicknames), is presented. GIBA implements a two step procedure to a given dataset of protein-protein interaction data. First, a clustering algorithm is applied to the interaction data, which is then followed by a filtering step to generate the final candidate list of predicted complexes. Results: The efficiency of GIBA is demonstrated through the analysis of 6 different yeast protein interaction datasets in comparison to four other available algorithms. We compared the results of the different methods by applying five different performance measurement metrices. Moreover, the parameters of the methods that constitute the filter have been checked on how they affect the final results. Conclusion: GIBA is an effective and easy to use tool for the detection of protein complexes out of experimentally measured protein - protein interaction networks. The results show that GIBA has superior prediction accuracy than previously published methods. [less ▲] Detailed reference viewed: 129 (0 UL) |
||