Results 1-15 of 15.
((uid:50000580))

Bookmark and Share    
Full Text
Peer Reviewed
See detailA guide for building biological pathways along with two case studies: hair and breast development.
Trindade, Daniel; Orsine, Lissur A.; Barbosa Da Silva, Adriano UL et al

in Methods (San Diego, Calif.) (2015), 74

Genomic information is being underlined in the format of biological pathways. Building these biological pathways is an ongoing demand and benefits from methods for extracting information from biomedical ... [more ▼]

Genomic information is being underlined in the format of biological pathways. Building these biological pathways is an ongoing demand and benefits from methods for extracting information from biomedical literature with the aid of text-mining tools. Here we hopefully guide you in the attempt of building a customized pathway or chart representation of a system. Our manual is based on a group of software designed to look at biointeractions in a set of abstracts retrieved from PubMed. However, they aim to support the work of someone with biological background, who does not need to be an expert on the subject and will play the role of manual curator while designing the representation of the system, the pathway. We therefore illustrate with two challenging case studies: hair and breast development. They were chosen for focusing on recent acquisitions of human evolution. We produced sub-pathways for each study, representing different phases of development. Differently from most charts present in current databases, we present detailed descriptions, which will additionally guide PESCADOR users along the process. The implementation as a web interface makes PESCADOR a unique tool for guiding the user along the biointeractions, which will constitute a novel pathway. [less ▲]

Detailed reference viewed: 42 (5 UL)
Full Text
Peer Reviewed
See detailuORFdb--a comprehensive literature database on eukaryotic uORF biology.
Wethmar, Klaus; Barbosa Da Silva, Adriano UL; Andrade-Navarro, Miguel A. et al

in Nucleic acids research (2014), 42(1), 60-7

Approximately half of all human transcripts contain at least one upstream translational initiation site that precedes the main coding sequence (CDS) and gives rise to an upstream open reading frame (uORF ... [more ▼]

Approximately half of all human transcripts contain at least one upstream translational initiation site that precedes the main coding sequence (CDS) and gives rise to an upstream open reading frame (uORF). We generated uORFdb, publicly available at http://cbdm.mdc-berlin.de/tools/uorfdb, to serve as a comprehensive literature database on eukaryotic uORF biology. Upstream ORFs affect downstream translation by interfering with the unrestrained progression of ribosomes across the transcript leader sequence. Although the first uORF-related translational activity was observed >30 years ago, and an increasing number of studies link defective uORF-mediated translational control to the development of human diseases, the features that determine uORF-mediated regulation of downstream translation are not well understood. The uORFdb was manually curated from all uORF-related literature listed at the PubMed database. It categorizes individual publications by a variety of denominators including taxon, gene and type of study. Furthermore, the database can be filtered for multiple structural and functional uORF-related properties to allow convenient and targeted access to the complex field of eukaryotic uORF biology. [less ▲]

Detailed reference viewed: 26 (2 UL)
Full Text
Peer Reviewed
See detailGenie: literature-based gene prioritization at multi genomic scale.
Fontaine, Jean-Fred; Priller, Florian; Barbosa Da Silva, Adriano UL et al

in Nucleic acids research (2011), 39(Web Server issue), 455-61

Biomedical literature is traditionally used as a way to inform scientists of the relevance of genes in relation to a research topic. However many genes, especially from poorly studied organisms, are not ... [more ▼]

Biomedical literature is traditionally used as a way to inform scientists of the relevance of genes in relation to a research topic. However many genes, especially from poorly studied organisms, are not discussed in the literature. Moreover, a manual and comprehensive summarization of the literature attached to the genes of an organism is in general impossible due to the high number of genes and abstracts involved. We introduce the novel Genie algorithm that overcomes these problems by evaluating the literature attached to all genes in a genome and to their orthologs according to a selected topic. Genie showed high precision (up to 100%) and the best performance in comparison to other algorithms in most of the benchmarks, especially when high sensitivity was required. Moreover, the prioritization of zebrafish genes involved in heart development, using human and mouse orthologs, showed high enrichment in differentially expressed genes from microarray experiments. The Genie web server supports hundreds of species, millions of genes and offers novel functionalities. Common run times below a minute, even when analyzing the human genome with hundreds of thousands of literature records, allows the use of Genie in routine lab work. Availability: http://cbdm.mdc-berlin.de/tools/genie/. [less ▲]

Detailed reference viewed: 12 (0 UL)
Full Text
Peer Reviewed
See detailPESCADOR, a web-based tool to assist text-mining of biointeractions extracted from PubMed queries.
Barbosa Da Silva, Adriano UL; Fontaine, Jean-Fred; Donnard, Elisa R. et al

in BMC bioinformatics (2011), 12

BACKGROUND: Biological function is greatly dependent on the interactions of proteins with other proteins and genes. Abstracts from the biomedical literature stored in the NCBI's PubMed database can be ... [more ▼]

BACKGROUND: Biological function is greatly dependent on the interactions of proteins with other proteins and genes. Abstracts from the biomedical literature stored in the NCBI's PubMed database can be used for the derivation of interactions between genes and proteins by identifying the co-occurrences of their terms. Often, the amount of interactions obtained through such an approach is large and may mix processes occurring in different contexts. Current tools do not allow studying these data with a focus on concepts of relevance to a user, for example, interactions related to a disease or to a biological mechanism such as protein aggregation. RESULTS: To help the concept-oriented exploration of such data we developed PESCADOR, a web tool that extracts a network of interactions from a set of PubMed abstracts given by a user, and allows filtering the interaction network according to user-defined concepts. We illustrate its use in exploring protein aggregation in neurodegenerative disease and in the expansion of pathways associated to colon cancer. CONCLUSIONS: PESCADOR is a platform independent web resource available at: http://cbdm.mdc-berlin.de/tools/pescador/ [less ▲]

Detailed reference viewed: 23 (2 UL)
Full Text
Peer Reviewed
See detailPreimplantation development regulatory pathway construction through a text-mining approach.
Donnard, Elisa; Barbosa Da Silva, Adriano UL; Guedes, Rafael L. M. et al

in BMC genomics (2011), 12 Suppl 4

BACKGROUND: The integration of sequencing and gene interaction data and subsequent generation of pathways and networks contained in databases such as KEGG Pathway is essential for the comprehension of ... [more ▼]

BACKGROUND: The integration of sequencing and gene interaction data and subsequent generation of pathways and networks contained in databases such as KEGG Pathway is essential for the comprehension of complex biological processes. We noticed the absence of a chart or pathway describing the well-studied preimplantation development stages; furthermore, not all genes involved in the process have entries in KEGG Orthology, important information for knowledge application with relation to other organisms. RESULTS: In this work we sought to develop the regulatory pathway for the preimplantation development stage using text-mining tools such as Medline Ranker and PESCADOR to reveal biointeractions among the genes involved in this process. The genes present in the resulting pathway were also used as seeds for software developed by our group called SeedServer to create clusters of homologous genes. These homologues allowed the determination of the last common ancestor for each gene and revealed that the preimplantation development pathway consists of a conserved ancient core of genes with the addition of modern elements. CONCLUSIONS: The generation of regulatory pathways through text-mining tools allows the integration of data generated by several studies for a more complete visualization of complex biological processes. Using the genes in this pathway as "seeds" for the generation of clusters of homologues, the pathway can be visualized for other organisms. The clustering of homologous genes together with determination of the ancestry leads to a better understanding of the evolution of such process. [less ▲]

Detailed reference viewed: 19 (1 UL)
Full Text
Peer Reviewed
See detailA reference guide for tree analysis and visualization.
Pavlopoulos, Georgios A.; Soldatos, Theodoros G.; Barbosa Da Silva, Adriano UL et al

in BioData mining (2010), 3(1), 1

The quantities of data obtained by the new high-throughput technologies, such as microarrays or ChIP-Chip arrays, and the large-scale OMICS-approaches, such as genomics, proteomics and transcriptomics ... [more ▼]

The quantities of data obtained by the new high-throughput technologies, such as microarrays or ChIP-Chip arrays, and the large-scale OMICS-approaches, such as genomics, proteomics and transcriptomics, are becoming vast. Sequencing technologies become cheaper and easier to use and, thus, large-scale evolutionary studies towards the origins of life for all species and their evolution becomes more and more challenging. Databases holding information about how data are related and how they are hierarchically organized expand rapidly. Clustering analysis is becoming more and more difficult to be applied on very large amounts of data since the results of these algorithms cannot be efficiently visualized. Most of the available visualization tools that are able to represent such hierarchies, project data in 2D and are lacking often the necessary user friendliness and interactivity. For example, the current phylogenetic tree visualization tools are not able to display easy to understand large scale trees with more than a few thousand nodes. In this study, we review tools that are currently available for the visualization of biological trees and analysis, mainly developed during the last decade. We describe the uniform and standard computer readable formats to represent tree hierarchies and we comment on the functionality and the limitations of these tools. We also discuss on how these tools can be developed further and should become integrated with various data sources. Here we focus on freely available software that offers to the users various tree-representation methodologies for biological data analysis. [less ▲]

Detailed reference viewed: 19 (1 UL)
Full Text
Peer Reviewed
See detailMartini: using literature keywords to compare gene sets.
Soldatos, Theodoros G.; O'Donoghue, Sean I.; Satagopam, Venkata UL et al

in Nucleic acids research (2010), 38(1), 26-38

Life scientists are often interested to compare two gene sets to gain insight into differences between two distinct, but related, phenotypes or conditions. Several tools have been developed for comparing ... [more ▼]

Life scientists are often interested to compare two gene sets to gain insight into differences between two distinct, but related, phenotypes or conditions. Several tools have been developed for comparing gene sets, most of which find Gene Ontology (GO) terms that are significantly over-represented in one gene set. However, such tools often return GO terms that are too generic or too few to be informative. Here, we present Martini, an easy-to-use tool for comparing gene sets. Martini is based, not on GO, but on keywords extracted from Medline abstracts; Martini also supports a much wider range of species than comparable tools. To evaluate Martini we created a benchmark based on the human cell cycle, and we tested several comparable tools (CoPub, FatiGO, Marmite and ProfCom). Martini had the best benchmark performance, delivering a more detailed and accurate description of function. Martini also gave best or equal performance with three other datasets (related to Arabidopsis, melanoma and ovarian cancer), suggesting that Martini represents an advance in the automated comparison of gene sets. In agreement with previous studies, our results further suggest that literature-derived keywords are a richer source of gene-function information than GO annotations. Martini is freely available at http://martini.embl.de. [less ▲]

Detailed reference viewed: 54 (7 UL)
Full Text
Peer Reviewed
See detailLAITOR--Literature Assistant for Identification of Terms co-Occurrences and Relationships.
Barbosa Da Silva, Adriano UL; Soldatos, Theodoros G.; Magalhaes, Ivan L. F. et al

in BMC bioinformatics (2010), 11

BACKGROUND: Biological knowledge is represented in scientific literature that often describes the function of genes/proteins (bioentities) in terms of their interactions (biointeractions). Such ... [more ▼]

BACKGROUND: Biological knowledge is represented in scientific literature that often describes the function of genes/proteins (bioentities) in terms of their interactions (biointeractions). Such bioentities are often related to biological concepts of interest that are specific of a determined research field. Therefore, the study of the current literature about a selected topic deposited in public databases, facilitates the generation of novel hypotheses associating a set of bioentities to a common context. RESULTS: We created a text mining system (LAITOR: Literature Assistant for Identification of Terms co-Occurrences and Relationships) that analyses co-occurrences of bioentities, biointeractions, and other biological terms in MEDLINE abstracts. The method accounts for the position of the co-occurring terms within sentences or abstracts. The system detected abstracts mentioning protein-protein interactions in a standard test (BioCreative II IAS test data) with a precision of 0.82-0.89 and a recall of 0.48-0.70. We illustrate the application of LAITOR to the detection of plant response genes in a dataset of 1000 abstracts relevant to the topic. CONCLUSIONS: Text mining tools combining the extraction of interacting bioentities and biological concepts with network displays can be helpful in developing reasonable hypotheses in different scientific backgrounds. [less ▲]

Detailed reference viewed: 62 (3 UL)
Full Text
Peer Reviewed
See detailMedlineRanker: flexible ranking of biomedical literature.
Fontaine, Jean-Fred; Barbosa Da Silva, Adriano UL; Schaefer, Martin et al

in Nucleic acids research (2009), 37(Web Server issue), 141-6

The biomedical literature is represented by millions of abstracts available in the Medline database. These abstracts can be queried with the PubMed interface, which provides a keyword-based Boolean search ... [more ▼]

The biomedical literature is represented by millions of abstracts available in the Medline database. These abstracts can be queried with the PubMed interface, which provides a keyword-based Boolean search engine. This approach shows limitations in the retrieval of abstracts related to very specific topics, as it is difficult for a non-expert user to find all of the most relevant keywords related to a biomedical topic. Additionally, when searching for more general topics, the same approach may return hundreds of unranked references. To address these issues, text mining tools have been developed to help scientists focus on relevant abstracts. We have implemented the MedlineRanker webserver, which allows a flexible ranking of Medline for a topic of interest without expert knowledge. Given some abstracts related to a topic, the program deduces automatically the most discriminative words in comparison to a random selection. These words are used to score other abstracts, including those from not yet annotated recent publications, which can be then ranked by relevance. We show that our tool can be highly accurate and that it is able to process millions of abstracts in a practical amount of time. MedlineRanker is free for use and is available at http://cbdm.mdc-berlin.de/tools/medlineranker. [less ▲]

Detailed reference viewed: 9 (0 UL)
Full Text
Peer Reviewed
See detailClustering of cognate proteins among distinct proteomes derived from multiple links to a single seed sequence.
Barbosa Da Silva, Adriano UL; Satagopam, Venkata UL; Schneider, Reinhard UL et al

in BMC bioinformatics (2008), 9

BACKGROUND: Modern proteomes evolved by modification of pre-existing ones. It is extremely important to comparative biology that related proteins be identified as members of the same cognate group, since ... [more ▼]

BACKGROUND: Modern proteomes evolved by modification of pre-existing ones. It is extremely important to comparative biology that related proteins be identified as members of the same cognate group, since a characterized putative homolog could be used to find clues about the function of uncharacterized proteins from the same group. Typically, databases of related proteins focus on those from completely-sequenced genomes. Unfortunately, relatively few organisms have had their genomes fully sequenced; accordingly, many proteins are ignored by the currently available databases of cognate proteins, despite the high amount of important genes that are functionally described only for these incomplete proteomes. RESULTS: We have developed a method to cluster cognate proteins from multiple organisms beginning with only one sequence, through connectivity saturation with that Seed sequence. We show that the generated clusters are in agreement with some other approaches based on full genome comparison. CONCLUSION: The method produced results that are as reliable as those produced by conventional clustering approaches. Generating clusters based only on individual proteins of interest is less time consuming than generating clusters for whole proteomes. [less ▲]

Detailed reference viewed: 89 (6 UL)
Full Text
Peer Reviewed
See detailA procedure to recruit members to enlarge protein family databases--the building of UECOG (UniRef-Enriched COG Database) as a model.
Fernandes, G. R.; Barbosa Da Silva, Adriano UL; Prosdocimi, F. et al

in Genetics and molecular research : GMR (2008), 7(3), 910-24

A procedure to recruit members to enlarge protein family databases is described here. The procedure makes use of UniRef50 clusters produced by UniProt. Current family entries are used to recruit ... [more ▼]

A procedure to recruit members to enlarge protein family databases is described here. The procedure makes use of UniRef50 clusters produced by UniProt. Current family entries are used to recruit additional members based on the UniRef50 clusters to which they belong. Only those additional UniRef50 members that are not fragments and whose length is within a restricted range relative to the original entry are recruited. The enriched dataset is then limited to contain only genomes from selected clades. We used the COG database - used for genome annotation and for studies of phylogenetics and gene evolution - as a model. To validate the method, a UniRef-Enriched COG0151 (UECOG) was tested with distinct procedures to compare recruited members with the recruiters: PSI-BLAST, secondary structure overlap (SOV), Seed Linkage, COGnitor, shared domain content, and neighbor-joining single-linkage, and observed that the former four agree in their validations. Presently, the UniRef50-based recruitment procedure enriches the COG database for Archaea, Bacteria and its subgroups Actinobacteria, Firmicutes, Proteobacteria, and other bacteria by 2.2-, 8.0-, 7.0-, 8.8-, 8.7-, and 4.2-fold, respectively, in terms of sequences, and also considerably increased the number of species. [less ▲]

Detailed reference viewed: 27 (1 UL)
Full Text
Peer Reviewed
See detailDevelopment of SRS.php, a Simple Object Access Protocol-based library for data acquisition from integrated biological databases.
Barbosa Da Silva, Adriano UL; Pafilis, E.; Ortega, J. M. et al

in Genetics and molecular research : GMR (2007), 6(4), 1142-50

Data integration has become an important task for biological database providers. The current model for data exchange among different sources simplifies the manner that distinct information is accessed by ... [more ▼]

Data integration has become an important task for biological database providers. The current model for data exchange among different sources simplifies the manner that distinct information is accessed by users. The evolution of data representation from HTML to XML enabled programs, instead of humans, to interact with biological databases. We present here SRS.php, a PHP library that can interact with the data integration Sequence Retrieval System (SRS). The library has been written using SOAP definitions, and permits the programmatic communication through webservices with the SRS. The interactions are possible by invoking the methods described in WSDL by exchanging XML messages. The current functions available in the library have been built to access specific data stored in any of the 90 different databases (such as UNIPROT, KEGG and GO) using the same query syntax format. The inclusion of the described functions in the source of scripts written in PHP enables them as webservice clients to the SRS server. The functions permit one to query the whole content of any SRS database, to list specific records in these databases, to get specific fields from the records, and to link any record among any pair of linked databases. The case study presented exemplifies the library usage to retrieve information regarding registries of a Plant Defense Mechanisms database. The Plant Defense Mechanisms database is currently being developed, and the proposal of SRS.php library usage is to enable the data acquisition for the further warehousing tasks related to its setup and maintenance. [less ▲]

Detailed reference viewed: 34 (4 UL)
Full Text
Peer Reviewed
See detailAbundance and diversity of resistance genes in the sugarcane transcriptome revealed by in silico analysis.
Wanderley-Nogueira, A. C.; Soares-Cavalcanti, N. M.; Morais, D. A. L. et al

in Genetics and molecular research : GMR (2007), 6(4), 866-89

Resistance genes (R-genes) are responsible for the first interaction of the plant with pathogens being responsible for the activation (or not) of the defense response. Despite their importance and ... [more ▼]

Resistance genes (R-genes) are responsible for the first interaction of the plant with pathogens being responsible for the activation (or not) of the defense response. Despite their importance and abundance, no tools for their automatic annotation are available yet. The present study analyzed R-genes in the sugarcane expressed sequence tags database which includes 26 libraries of different tissues and development stages comprising 237,954 expressed sequence tags. A new annotation routine was used in order to avoid redundancies and overestimation of R-gene number, common mistakes in previous evaluations. After in silico screening, 280 R-genes were identified, with 196 bearing the complete domains expected. Regarding the alignments, most of the sugarcane's clusters yielded best matches with proteins from Oryza sativa, probably due to the prevalence of sequences of this monocot in data banks. All R-gene classes were found except the subclass LRR-NBS-TIR (leucine-rich repeats, nucleotide-binding site, including Toll interleukin-1 receptors), with prevalence of the kinase (Pto-like) class. R-genes were expressed in all libraries, but flowers, transition root to shoot, and roots were the most representative, suggesting that in sugarcane the expression of R-genes in non-induced conditions prevails in these tissues. In leaves, only low level of expression was found for some gene classes, while others were completely absent. A high allelic diversity was found in all classes of R-genes, sometimes showing best alignments with dicotyledons, despite the great number of genes from rice, maize and other grasses deposited in data banks. The results and future possibilities regarding R-genes in sugarcane research and breeding are further discussed. [less ▲]

Detailed reference viewed: 64 (0 UL)
Full Text
Peer Reviewed
See detailPROTOGIM: a novel tool to search motifs and domains in hypothetical proteins of protozoan genomes.
Cestari, Igor S.; Haver, Nicolaas J.; Barbosa Da Silva, Adriano UL et al

in Parasitology research (2006), 98(4), 375-7

Whole sequencing of protozoan trypanosomatid genomes revealed the presence of several predicted unknown genes coding for hypothetical proteins. Pairwise, alignment-based, computational methods available ... [more ▼]

Whole sequencing of protozoan trypanosomatid genomes revealed the presence of several predicted unknown genes coding for hypothetical proteins. Pairwise, alignment-based, computational methods available online are unable to identify the function of these sequences. To detect clues to identify the function of hypothetical proteins, a user-friendly, bioinformatic tool named PROTOzoan Gene Identification Motifs (PROTOGIM, available on http://www.biowebdb.org/protogim ) was developed, which allows the user to search functional patterns of hypothetical proteins through the screening of regular expression in the sequences. The analysis of 1,194 trypanosomatid hypothetical proteins through PROTOGIM resulted in an identification of motifs and domains in 98% of the cases, demonstrating the reliability and accuracy of the employed method. The added value of this tool is the possibility to modify or insert new regular expressions to perform an analysis against either one or several sequences at the same time. An in silico strategy along with biochemical and molecular characterizations creates new possibilities to find the functions of hypothetical proteins at the postgenome era. [less ▲]

Detailed reference viewed: 21 (1 UL)
Full Text
Peer Reviewed
See detailMoving pieces in a taxonomic puzzle: venom 2D-LC/MS and data clustering analyses to infer phylogenetic relationships in some scorpions from the Buthidae family (Scorpiones).
Nascimento, Danielle G.; Rates, Breno; Santos, Daniel M. et al

in Toxicon : official journal of the International Society on Toxinology (2006), 47(6), 628-39

The Buthidae is the most clinically important scorpion family, with over 500 species distributed worldwide. Taxonomical positions and phylogenetic relationships concerning the representative genera and ... [more ▼]

The Buthidae is the most clinically important scorpion family, with over 500 species distributed worldwide. Taxonomical positions and phylogenetic relationships concerning the representative genera and species of this family have been mostly inferred based upon comparisons between morphological characters. Yet, some authors have performed such inferences by comparing some structural properties of a few selected molecules found in the venoms from these scorpions. Here, we propose a novel methodology pipeline designed to address these issues. We have analyzed the whole venoms from some species that exemplify peculiar cases in the Buthidae family (Tityus stigmurus, Tityus serrulatus, Tityus bahiensis, Leiurus quinquestriatus quinquestriatus and Leiurus quinquestriatus hebraeus), by means of a proteomic approach using a 2D-LC/MS technique. The molecules found in these venoms were clustered according to their physicochemical properties (molecular mass and hydrophobicity), by using the machine learning-based Weka software. The clusters assessment, along with the number of molecules found in a given cluster for each scorpion, which assigns for the venom and structural family complexities, respectively, was used to generate a phenetic correlation tree for positioning these species. Our results were in accordance with the classical taxonomy viewpoint, which places T. serrulatus and T. stigmurus as very close species, T. bahiensis as a less related species in the Tityus genus and L. q. quinquestriatus and L. q. hebraeus with small differences within the same species (L. quinquestriatus). Therefore, we believe that this is a well-suited method to determine venom complexities that reflect the scorpions' evolutionary history, which can be crucial to reconstruct their phylogeny through the molecular evolution of their venoms. [less ▲]

Detailed reference viewed: 12 (0 UL)