![]() Del Sol Mesa, Antonio ![]() in Nucleic Acids Research (2022) Detailed reference viewed: 41 (8 UL)![]() ; ; et al in Nucleic Acids Research (2021) Since 1992 PredictProtein (https://predictprotein.org) is a one-stop online resource for protein sequence analysis with its main site hosted at the Luxembourg Centre for Systems Biomedicine (LCSB) and ... [more ▼] Since 1992 PredictProtein (https://predictprotein.org) is a one-stop online resource for protein sequence analysis with its main site hosted at the Luxembourg Centre for Systems Biomedicine (LCSB) and queried monthly by over 3,000 users in 2020. PredictProtein was the first Internet server for protein predictions. It pioneered combining evolutionary information and machine learning. Given a protein sequence as input, the server outputs multiple sequence alignments, predictions of protein structure in 1D and 2D (secondary structure, solvent accessibility, transmembrane segments, disordered regions, protein flexibility, and disulfide bridges) and predictions of protein function (functional effects of sequence variation or point mutations, Gene Ontology (GO) terms, subcellular localization, and protein-, RNA-, and DNA binding). PredictProtein's infrastructure has moved to the LCSB increasing throughput; the use of MMseqs2 sequence search reduced runtime five-fold (apparently without lowering performance of prediction methods); user interface elements improved usability, and new prediction methods were added. PredictProtein recently included predictions from deep learning embeddings (GO and secondary structure) and a method for the prediction of proteins and residues binding DNA, RNA, or other proteins. PredictProtein.org aspires to provide reliable predictions to computational and experimental biologists alike. All scripts and methods are freely available for offline execution in high-throughput settings. [less ▲] Detailed reference viewed: 67 (1 UL)![]() ; Hoksza, David ![]() in Nucleic Acids Research (2020) Human genome sequencing efforts have greatly expanded, and a plethora of missense variants identified both in patients and in the general population is now publicly accessible. Interpretation of the ... [more ▼] Human genome sequencing efforts have greatly expanded, and a plethora of missense variants identified both in patients and in the general population is now publicly accessible. Interpretation of the molecular-level effect of missense variants, however, remains challenging and requires a particular investigation of amino acid substitutions in the context of protein structure and function. Answers to questions like ‘Is a variant perturbing a site involved in key macromolecular interactions and/or cellular signaling?’, or ‘Is a variant changing an amino acid located at the protein core or part of a cluster of known pathogenic mutations in 3D?’ are crucial. Motivated by these needs, we developed MISCAST (missense variant to protein structure analysis web suite; http://miscast.broadinstitute.org/). MISCAST is an interactive and user-friendly web server to visualize and analyze missense variants in protein sequence and structure space. Additionally, a comprehensive set of protein structural and functional features have been aggregated in MISCAST from multiple databases, and displayed on structures alongside the variants to provide users with the biological context of the variant location in an integrated platform. We further made the annotated data and protein structures readily downloadable from MISCAST to foster advanced offline analysis of missense variants by a wide biological community. [less ▲] Detailed reference viewed: 76 (2 UL)![]() ; ; et al in Nucleic acids research (2020) The Human Phenotype Ontology (HPO, https://hpo.jax.org) was launched in 2008 to provide a comprehensive logical standard to describe and computationally analyze phenotypic abnormalities found in human ... [more ▼] The Human Phenotype Ontology (HPO, https://hpo.jax.org) was launched in 2008 to provide a comprehensive logical standard to describe and computationally analyze phenotypic abnormalities found in human disease. The HPO is now a worldwide standard for phenotype exchange. The HPO has grown steadily since its inception due to considerable contributions from clinical experts and researchers from a diverse range of disciplines. Here, we present recent major extensions of the HPO for neurology, nephrology, immunology, pulmonology, newborn screening, and other areas. For example, the seizure subontology now reflects the International League Against Epilepsy (ILAE) guidelines and these enhancements have already shown clinical validity. We present new efforts to harmonize computational definitions of phenotypic abnormalities across the HPO and multiple phenotype ontologies used for animal models of disease. These efforts will benefit software such as Exomiser by improving the accuracy and scope of cross-species phenotype matching. The computational modeling strategy used by the HPO to define disease entities and phenotypic features and distinguish between them is explained in detail.We also report on recent efforts to translate the HPO into indigenous languages. Finally, we summarize recent advances in the use of HPO in electronic health record systems. [less ▲] Detailed reference viewed: 86 (0 UL)![]() ; ; et al in Nucleic Acids Research (2019) Clinical genetic testing has exponentially expanded in recent years, leading to an overwhelming amount of patient variants with high variability in pathogenicity and heterogeneous phenotypes. A large part ... [more ▼] Clinical genetic testing has exponentially expanded in recent years, leading to an overwhelming amount of patient variants with high variability in pathogenicity and heterogeneous phenotypes. A large part of the variant level data are comprehensively aggregated in public databases such as ClinVar. However, the ability to explore this rich resource and answer general questions such as “How many genes inside ClinVar are associated with a specific disease? or “In which part of the protein are patient variants located?” is limited and requires advanced bioinformatics processing. Here, we present Simple ClinVar (http://simple-clinvar.broadinstitute.org/) a web- server application that is able to provide variant, gene, and disease level summary statistics based on the entire ClinVar database in a dynamic and user-friendly web-interface. Overall, our web application is able to interactively answer basic questions regarding genetic variation and its known relationships to disease. By typing a disease term of interest, t he user can identify in seconds the genes and phenotypes most frequently reported to ClinVar. Subsets of variants can then be further explored, filtered, or mapped and visualized in the corresponding protein sequences. Our website will follow ClinVar monthly releases and provide easy access to rich ClinVar resources to a broader audience including basic and clinical scientists. [less ▲] Detailed reference viewed: 129 (2 UL)![]() del Sol Mesa, Antonio ![]() in Nucleic Acids Research (2019) Detailed reference viewed: 193 (13 UL)![]() Okawa, Satoshi ![]() ![]() in Nucleic Acids Research (2019) Detailed reference viewed: 202 (33 UL)![]() ; ; et al in Nucleic Acids Research (2019) The Protein Data Bank in Europe-Knowledge Base (PDBe-KB, https://pdbe-kb.org) is a community-driven, collaborative resource for literature-derived, manually curated and computationally predicted ... [more ▼] The Protein Data Bank in Europe-Knowledge Base (PDBe-KB, https://pdbe-kb.org) is a community-driven, collaborative resource for literature-derived, manually curated and computationally predicted structural and functional annotations of macromolecular structure data, contained in the Protein Data Bank (PDB). The goal of PDBe-KB is two-fold: (i) to increase the visibility and reduce the fragmentation of annotations contributed by specialist data resources, and to make these data more findable, accessible, interoperable and reusable (FAIR) and (ii) to place macromolecular structure data in their biological context, thus facilitating their use by the broader scientific community in fundamental and applied research. Here, we describe the guidelines of this collaborative effort, the current status of contributed data, and the PDBe-KB infrastructure, which includes the data exchange format, the deposition system for added value annotations, the distributable database containing the assembled data, and programmatic access endpoints. We also describe a series of novel web-pages—the PDBe-KB aggregated views of structure data—which combine information on macromolecular structures from many PDB entries. We have recently released the first set of pages in this series, which provide an overview of available structural and functional information for a protein of interest, referenced by a UniProtKB accession. [less ▲] Detailed reference viewed: 67 (2 UL)![]() Zaffaroni, Gaia ![]() ![]() in Nucleic Acids Research (2019) Detailed reference viewed: 157 (31 UL)![]() ; ; et al in Nucleic acids research (2019), 47(W1), 345-349 PrankWeb is an online resource providing an interface to P2Rank, a state-of-the-art method for ligand binding site prediction. P2Rank is a template-free machine learning method based on the prediction of ... [more ▼] PrankWeb is an online resource providing an interface to P2Rank, a state-of-the-art method for ligand binding site prediction. P2Rank is a template-free machine learning method based on the prediction of local chemical neighborhood ligandability centered on points placed on a solvent-accessible protein surface. Points with a high ligandability score are then clustered to form the resulting ligand binding sites. In addition, PrankWeb provides a web interface enabling users to easily carry out the prediction and visually inspect the predicted binding sites via an integrated sequence-structure view. Moreover, PrankWeb can determine sequence conservation for the input molecule and use this in both the prediction and result visualization steps. Alongside its online visualization options, PrankWeb also offers the possibility of exporting the results as a PyMOL script for offline visualization. The web frontend communicates with the server side via a REST API. In high-throughput scenarios, therefore, users can utilize the server API directly, bypassing the need for a web-based frontend or installation of the P2Rank application. PrankWeb is available at http://prankweb.cz/, while the web application source code and the P2Rank method can be accessed at https://github.com/jendelel/PrankWebApp and https://github.com/rdk/p2rank, respectively. [less ▲] Detailed reference viewed: 89 (0 UL)![]() Gerard, Déborah ![]() ![]() in Nucleic Acids Research (2018) Temporal data on gene expression and context-specific open chromatin states can improve identification of key transcription factors (TFs) and the gene regulatory networks (GRNs) controlling cellular ... [more ▼] Temporal data on gene expression and context-specific open chromatin states can improve identification of key transcription factors (TFs) and the gene regulatory networks (GRNs) controlling cellular differentiation. However, their integration remains challenging. Here, we delineate a general approach for data-driven and unbiased identification of key TFs and dynamic GRNs, called EPIC-DREM. We generated time-series transcriptomic and epigenomic profiles during differentiation of mouse multipotent bone marrow stromal cell line (ST2) toward adipocytes and osteoblasts. Using our novel approach we constructed time-resolved GRNs for both lineages and identifed the shared TFs involved in both differentiation processes. To take an alternative approach to prioritize the identified shared regulators, we mapped dynamic super-enhancers in both lineages and associated them to target genes with correlated expression profiles. The combination of the two approaches identified aryl hydrocarbon receptor (AHR) and Glis family zinc finger 1 (GLIS1) as mesenchymal key TFs controlled by dynamic cell type-specific super-enhancers that become repressed in both lineages. AHR and GLIS1 control differentiation-induced genes and their overexpression can inhibit the lineage commitment of the multipotent bone marrow-derived ST2 cells. [less ▲] Detailed reference viewed: 177 (43 UL)![]() Noronha, Alberto ![]() ![]() ![]() in Nucleic Acids Research (2018) A multitude of factors contribute to complex diseases and can be measured with ‘omics’ methods. Databases facilitate data interpretation for underlying mechanisms. Here, we describe the Virtual Metabolic ... [more ▼] A multitude of factors contribute to complex diseases and can be measured with ‘omics’ methods. Databases facilitate data interpretation for underlying mechanisms. Here, we describe the Virtual Metabolic Human (VMH, www.vmh.life) database encapsulating current knowledge of human metabolism within five interlinked resources ‘Human metabolism’, ‘Gut microbiome’, ‘Disease’, ‘Nutrition’, and ‘ReconMaps’. The VMH captures 5180 unique metabolites, 17 730 unique reactions, 3695 human genes, 255 Mendelian diseases, 818 microbes, 632 685 microbial genes and 8790 food items. The VMH’s unique features are (i) the hosting of the metabolic reconstructions of human and gut microbes amenable for metabolic modeling; (ii) seven human metabolic maps for data visualization; (iii) a nutrition designer; (iv) a user-friendly webpage and application-programming interface to access its content; (v) user feedback option for community engagement and (vi) the connection of its entities to 57 other web resources. The VMH represents a novel, interdisciplinary database for data interpretation and hypothesis generation to the biomedical community. [less ▲] Detailed reference viewed: 305 (30 UL)![]() ; ; Satagopam, Venkata ![]() in Nucleic Acids Research (2017), 45(20), 11495-11514 The post-genomic era has provided researchers with a deluge of protein sequences. However, a significant fraction of the proteins encoded by sequenced genomes remains without an identified function. Here ... [more ▼] The post-genomic era has provided researchers with a deluge of protein sequences. However, a significant fraction of the proteins encoded by sequenced genomes remains without an identified function. Here, we aim at determining how many enzymes of uncertain or unknown function are still present in the Saccharomyces cerevisiae and human proteomes. Using information available in the Swiss-Prot, BRENDA and KEGG databases in combination with a Hidden Markov Model-based method, we estimate that >600 yeast and 2000 human proteins (>30% of their proteins of unknown function) are enzymes whose precise function(s) remain(s) to be determined. This illustrates the impressive scale of the ‘unknown enzyme problem’. We extensively review classical biochemical as well as more recent systematic experimental and computational approaches that can be used to support enzyme function discovery research. Finally, we discuss the possible roles of the elusive catalysts in light of recent developments in the fields of enzymology and metabolism as well as the significance of the unknown enzyme problem in the context of metabolic modeling, metabolic engineering and rare disease research. [less ▲] Detailed reference viewed: 301 (14 UL)![]() ; ; et al in Nucleic Acids Research (2017) Changes in mature microRNA (miRNA) levels that occur downstream of signaling cascades play an important role during human development and disease. However, the regulation of primary microRNA (pri-miRNA ... [more ▼] Changes in mature microRNA (miRNA) levels that occur downstream of signaling cascades play an important role during human development and disease. However, the regulation of primary microRNA (pri-miRNA) genes remains to be dissected in detail. To address this, we followed a data-driven approach and developed a transcript identification, validation and quantification pipeline for characterizing the regulatory domains of pri-miRNAs. Integration of 92 nascent transcriptomes and multilevel data from cells arising from ecto-, endo- and mesoderm lineages reveals cell type-specific expression patterns, allows fine-resolution mapping of transcription start sites (TSS) and identification of candidate regulatory regions. We show that inter- and intragenic pri-miRNA transcripts span vast genomic regions and active TSS locations differ across cell types, exemplified by the mir-29a∼29b-1, mir-100∼let-7a-2∼125b-1 and miR-221∼222 clusters. Considering the presence of multiple TSS as an important regulatory feature at miRNA loci, we developed a strategy to quantify differential TSS usage. We demonstrate that the TSS activities associate with cell type-specific super-enhancers, differential stimulus responsiveness and higher-order chromatin structure. These results pave the way for building detailed regulatory maps of miRNA loci. [less ▲] Detailed reference viewed: 178 (6 UL)![]() ; ; et al in Nucleic Acids Research (2016), 44(4), 1760-1775 Transcription factor binding specificity is crucial for proper target gene regulation. Motif discovery algorithms identify the main features of the binding patterns, but the accuracy on the lower affinity ... [more ▼] Transcription factor binding specificity is crucial for proper target gene regulation. Motif discovery algorithms identify the main features of the binding patterns, but the accuracy on the lower affinity sites is often poor. Nuclear factor E2-related factor 2 (NRF2) is a ubiquitous redox-activated transcription factor having a key protective role against endogenous and exogenous oxidant and electrophile stress. Herein, we decipher the effects of sequence variation on the DNA binding sequence of NRF2, in order to identify both genome-wide bind- ing sites for NRF2 and disease-associated regulatory SNPs (rSNPs) with drastic effects on NRF2 binding. Interactions between NRF2 and DNA were studied using molecular modelling, and NRF2 chromatin immunoprecipitation-sequence datasets together with protein binding microarray measurements were utilized to study binding sequence variation in detail. The binding model thus generated was used to identify genome-wide binding sites for NRF2, and genomic binding sites with rSNPs that have strong effects on NRF2 binding and reside on active regulatory elements in human cells. As a proof of concept, miR-126–3p and -5p were identified as NRF2 target microRNAs, and a rSNP (rs113067944) residing on NRF2 target gene (Ferritin, light polypeptide, FTL) promoter was experimentally verified to decrease NRF2 binding and result in decreased transcriptional activity. [less ▲] Detailed reference viewed: 113 (2 UL)![]() ; ; et al in Nucleic Acids Research (2016) Deep phenotyping has been defined as the precise and comprehensive analysis of phenotypic abnormalities in which the individual components of the phenotype are observed and described. The three components ... [more ▼] Deep phenotyping has been defined as the precise and comprehensive analysis of phenotypic abnormalities in which the individual components of the phenotype are observed and described. The three components of the Human Phenotype Ontology (HPO; www.human-phenotype-ontology.org) project are the phenotype vocabulary, disease-phenotype annotations and the algorithms that operate on these. These components are being used for computational deep phenotyping and precision medicine as well as integration of clinical data into translational research. The HPO is being increasingly adopted as a standard for phenotypic abnormalities by diverse groups such as international rare disease organizations, registries, clinical labs, biomedical resources, and clinical software tools and will thereby contribute toward nascent efforts at global data exchange for identifying disease etiologies. This update article reviews the progress of the HPO project since the debut Nucleic Acids Research database article in 2014, including specific areas of expansion such as common (complex) disease, new algorithms for phenotype driven genomic discovery and diagnostics, integration of cross-species mapping efforts with the Mammalian Phenotype Ontology, an improved quality control pipeline, and the addition of patient-friendly terminology. [less ▲] Detailed reference viewed: 222 (1 UL)![]() Galhardo, Mafalda Sofia ![]() ![]() in Nucleic Acids Research (2015), 43(18), 8839-8855 We previously showed that disease-linked metabolic genes are often under combinatorial regulation. Using the genome-wide ChIP-Seq binding profiles for 93 transcription factors in nine different cell lines ... [more ▼] We previously showed that disease-linked metabolic genes are often under combinatorial regulation. Using the genome-wide ChIP-Seq binding profiles for 93 transcription factors in nine different cell lines, we show that genes under high regulatory load are significantly enriched for disease-association across cell types. We find that transcription factor load correlates with the enhancer load of the genes and thereby allows the identification of genes under high regulatory load by epigenomic mapping of active enhancers. Identification of the high enhancer load genes across 139 samples from 96 different cell and tissue types reveals a consistent enrichment for disease-associated genes in a cell type-selective manner. The underlying genes are not limited to super-enhancer genes and show several types of disease-association evidence beyond genetic variation (such as biomarkers). Interestingly, the high regulatory load genes are involved in more KEGG pathways than expected by chance, exhibit increased betweenness centrality in the interaction network of liver disease genes, and carry longer 3'UTRs with more microRNA (miRNA) binding sites than genes on average, suggesting a role as hubs integrating signals within regulatory networks. In summary, epigenetic mapping of active enhancers presents a promising and unbiased approach for identification of novel disease genes in a cell type-selective manner. [less ▲] Detailed reference viewed: 254 (40 UL)![]() Killcoyne, Sarah ![]() ![]() in Nucleic Acids Research (2015) Identifying large-scale structural variation in cancer genomes continues to be a challenge to researchers. Current methods rely on genome alignments based on a reference that can be a poor fit to highly ... [more ▼] Identifying large-scale structural variation in cancer genomes continues to be a challenge to researchers. Current methods rely on genome alignments based on a reference that can be a poor fit to highly variant and complex tumor genomes. To address this challenge we developed a method that uses available breakpoint information to generate models of structural variations. We use these models as references to align previously unmapped and discordant reads from a genome. By using these models to align unmapped reads, we show that our method can help to identify large-scale variations that have been previously missed. [less ▲] Detailed reference viewed: 212 (57 UL)![]() del Sol Mesa, Antonio ![]() in Nucleic Acids Research (2015) Detailed reference viewed: 186 (16 UL)![]() Nicklas, Sarah ![]() ![]() ![]() in Nucleic Acids Research (2015) Detailed reference viewed: 261 (30 UL) |
||