References of "Hoksza, David 50026933"
     in
Bookmark and Share    
Full Text
Peer Reviewed
See detailComprehensive characterization of amino acidpositions in protein structures reveals moleculareffect of missense variants
iqbal, Sumaiya; Perez-Palma, Eduardo; Jespersen, Jakob B. et al

in Proceedings of the National Academy of Sciences of the United States of America (2020)

Interpretation of the colossal number of genetic variants identified from sequencing applications is one of the major bottlenecks in clinical genetics, with the inference of the effect of amino acid ... [more ▼]

Interpretation of the colossal number of genetic variants identified from sequencing applications is one of the major bottlenecks in clinical genetics, with the inference of the effect of amino acid-substituting missense variations on protein structure and function being especially challenging. Here we characterize the three-dimensional (3D) amino acid positions affected in pathogenic and population variants from 1,330 disease-associated genes using over 14,000 experimentally solved human protein structures. By measuring the statistical burden of variations (i.e., point mutations) from all genes on 40 3D protein features, accounting for the structural, chemical, and functional context of the variations’ positions, we identify features that are generally associated with pathogenic and population missense variants. We then perform the same amino acid-level analysis individually for 24 protein functional classes, which reveals unique characteristics of the positions of the altered amino acids: We observe up to 46% divergence of the class-specific features from the general characteristics obtained by the analysis on all genes, which is consistent with the structural diversity of essential regions across different protein classes. We demonstrate that the function-specific 3D features of the variants match the readouts of mutagenesis experiments for BRCA1 and PTEN, and positively correlate with an independent set of clinically interpreted pathogenic and benign missense variants. Finally, we make our results available through a web server to foster accessibility and downstream research. Our findings represent a crucial step toward translational genetics, from highlighting the impact of mutations on protein structure to rationalizing the variants’ pathogenicity in terms of the perturbed molecular mechanisms. [less ▲]

Detailed reference viewed: 105 (1 UL)
Full Text
Peer Reviewed
See detailMISCAST: MIssense variant to protein StruCture Analysis web SuiTe
Iqbal, Sumaiya; Hoksza, David UL; Pérez-Palma, Eduardo et al

in Nucleic Acids Research (2020)

Human genome sequencing efforts have greatly expanded, and a plethora of missense variants identified both in patients and in the general population is now publicly accessible. Interpretation of the ... [more ▼]

Human genome sequencing efforts have greatly expanded, and a plethora of missense variants identified both in patients and in the general population is now publicly accessible. Interpretation of the molecular-level effect of missense variants, however, remains challenging and requires a particular investigation of amino acid substitutions in the context of protein structure and function. Answers to questions like ‘Is a variant perturbing a site involved in key macromolecular interactions and/or cellular signaling?’, or ‘Is a variant changing an amino acid located at the protein core or part of a cluster of known pathogenic mutations in 3D?’ are crucial. Motivated by these needs, we developed MISCAST (missense variant to protein structure analysis web suite; http://miscast.broadinstitute.org/). MISCAST is an interactive and user-friendly web server to visualize and analyze missense variants in protein sequence and structure space. Additionally, a comprehensive set of protein structural and functional features have been aggregated in MISCAST from multiple databases, and displayed on structures alongside the variants to provide users with the biological context of the variant location in an integrated platform. We further made the annotated data and protein structures readily downloadable from MISCAST to foster advanced offline analysis of missense variants by a wide biological community. [less ▲]

Detailed reference viewed: 77 (2 UL)
Full Text
See detailInsights into protein structural, physicochemical, and functional consequences of missense variants in 1,330 disease-associated human genes 693259
Iqbal, Sumaiya; Jespersen, Jakob B.; Perez-Palma, Eduardo et al

E-print/Working paper (2019)

Inference of the structural and functional consequences of amino acid-altering missense variants is challenging and not yet scalable. Clinical and research applications of the colossal number of ... [more ▼]

Inference of the structural and functional consequences of amino acid-altering missense variants is challenging and not yet scalable. Clinical and research applications of the colossal number of identified missense variants is thus limited. Here we describe the aggregation and analysis of large-scale genomic variation and structural biology data for 1,330 disease-associated genes. Comparing the burden of 40 structural, physicochemical, and functional protein features of altered amino acids with 3-dimensional coordinates, we found 18 and 14 features that are associated with pathogenic and population missense variants, respectively. Separate analyses of variants from 24 protein functional classes revealed novel function-dependent vulnerable features. We then devised a quantitative spectrum, identifying variants with higher pathogenic variant-associated features. Finally, we developed a web resource (MISCAST; http://miscast.broadinstitute.org/) for interactive analysis of variants on linear and tertiary protein structures. The biological impact of missense variants available through the webtool will assist researchers in hypothesizing variant pathogenicity and disease trajectories. [less ▲]

Detailed reference viewed: 256 (1 UL)
Full Text
Peer Reviewed
See detailPDBe-KB: a community-driven resource for structural and functional annotations
Varadi, M.; Berrisford, J.; Deshpande, M. et al

in Nucleic Acids Research (2019)

The Protein Data Bank in Europe-Knowledge Base (PDBe-KB, https://pdbe-kb.org) is a community-driven, collaborative resource for literature-derived, manually curated and computationally predicted ... [more ▼]

The Protein Data Bank in Europe-Knowledge Base (PDBe-KB, https://pdbe-kb.org) is a community-driven, collaborative resource for literature-derived, manually curated and computationally predicted structural and functional annotations of macromolecular structure data, contained in the Protein Data Bank (PDB). The goal of PDBe-KB is two-fold: (i) to increase the visibility and reduce the fragmentation of annotations contributed by specialist data resources, and to make these data more findable, accessible, interoperable and reusable (FAIR) and (ii) to place macromolecular structure data in their biological context, thus facilitating their use by the broader scientific community in fundamental and applied research. Here, we describe the guidelines of this collaborative effort, the current status of contributed data, and the PDBe-KB infrastructure, which includes the data exchange format, the deposition system for added value annotations, the distributable database containing the assembled data, and programmatic access endpoints. We also describe a series of novel web-pages—the PDBe-KB aggregated views of structure data—which combine information on macromolecular structures from many PDB entries. We have recently released the first set of pages in this series, which provide an overview of available structural and functional information for a protein of interest, referenced by a UniProtKB accession. [less ▲]

Detailed reference viewed: 67 (2 UL)
Full Text
Peer Reviewed
See detailPrankWeb: a web server for ligand binding site prediction and visualization.
Jendele, Lukas; Krivak, Radoslav; Skoda, Petr et al

in Nucleic acids research (2019), 47(W1), 345-349

PrankWeb is an online resource providing an interface to P2Rank, a state-of-the-art method for ligand binding site prediction. P2Rank is a template-free machine learning method based on the prediction of ... [more ▼]

PrankWeb is an online resource providing an interface to P2Rank, a state-of-the-art method for ligand binding site prediction. P2Rank is a template-free machine learning method based on the prediction of local chemical neighborhood ligandability centered on points placed on a solvent-accessible protein surface. Points with a high ligandability score are then clustered to form the resulting ligand binding sites. In addition, PrankWeb provides a web interface enabling users to easily carry out the prediction and visually inspect the predicted binding sites via an integrated sequence-structure view. Moreover, PrankWeb can determine sequence conservation for the input molecule and use this in both the prediction and result visualization steps. Alongside its online visualization options, PrankWeb also offers the possibility of exporting the results as a PyMOL script for offline visualization. The web frontend communicates with the server side via a REST API. In high-throughput scenarios, therefore, users can utilize the server API directly, bypassing the need for a web-based frontend or installation of the P2Rank application. PrankWeb is available at http://prankweb.cz/, while the web application source code and the P2Rank method can be accessed at https://github.com/jendelel/PrankWebApp and https://github.com/rdk/p2rank, respectively. [less ▲]

Detailed reference viewed: 93 (0 UL)
Full Text
Peer Reviewed
See detailClosing the gap between formats for storing layout information in systems biology.
Hoksza, David UL; Gawron, Piotr UL; Ostaszewski, Marek UL et al

in Briefings in bioinformatics (2019)

The understanding of complex biological networks often relies on both a dedicated layout and a topology. Currently, there are three major competing layout-aware systems biology formats, but there are no ... [more ▼]

The understanding of complex biological networks often relies on both a dedicated layout and a topology. Currently, there are three major competing layout-aware systems biology formats, but there are no software tools or software libraries supporting all of them. This complicates the management of molecular network layouts and hinders their reuse and extension. In this paper, we present a high-level overview of the layout formats in systems biology, focusing on their commonalities and differences, review their support in existing software tools, libraries and repositories and finally introduce a new conversion module within the MINERVA platform. The module is available via a REST API and offers, besides the ability to convert between layout-aware systems biology formats, the possibility to export layouts into several graphical formats. The module enables conversion of very large networks with thousands of elements, such as disease maps or metabolic reconstructions, rendering it widely applicable in systems biology. [less ▲]

Detailed reference viewed: 194 (3 UL)
Full Text
Peer Reviewed
See detailMachine Learning to Support the Presentation of Complex Pathway Graphs.
Nielsen, Sune Steinbjorn UL; Ostaszewski, Marek UL; McGee, Fintan et al

in IEEE/ACM transactions on computational biology and bioinformatics (2019)

Visualization of biological mechanisms by means of pathway graphs is necessary to better understand the often complex underlying system. Manual layout of such pathways or maps of knowledge is a difficult ... [more ▼]

Visualization of biological mechanisms by means of pathway graphs is necessary to better understand the often complex underlying system. Manual layout of such pathways or maps of knowledge is a difficult and time consuming process. Node duplication is a technique that makes layouts with improved readability possible by reducing edge crossings and shortening edge lengths in drawn diagrams. In this article we propose an approach using Machine Learning (ML) to facilitate parts of this task by training a Support Vector Machine (SVM) with actions taken during manual biocuration. Our training input is a series of incremental snapshots of a diagram describing mechanisms of a disease, progressively curated by a human expert employing node duplication in the process. As a test of the trained SVM models, they are applied to a single large instance and 25 medium-sized instances of hand-curated biological pathways. Finally, in a user validation study, we compare the model predictions to the outcome of a node duplication questionnaire answered by users of biological pathways with varying experience. We successfully predicted nodes for duplication and emulated human choices, demonstrating that our approach can effectively learn human-like node duplication preferences to support curation of pathway diagrams in various contexts. [less ▲]

Detailed reference viewed: 113 (4 UL)
Full Text
Peer Reviewed
See detailrPredictorDB: a predictive database of individual secondary structures of RNAs and their formatted plots.
Jelinek, Jan; Hoksza, David UL; Hajic, Jan et al

in Database: the Journal of Biological Databases and Curation (2019), 2019

Secondary data structure of RNA molecules provides insights into the identity and function of RNAs. With RNAs readily sequenced, the question of their structural characterization is increasingly important ... [more ▼]

Secondary data structure of RNA molecules provides insights into the identity and function of RNAs. With RNAs readily sequenced, the question of their structural characterization is increasingly important. However, RNA structure is difficult to acquire. Its experimental identification is extremely technically demanding, while computational prediction is not accurate enough, especially for large structures of long sequences. We address this difficult situation with rPredictorDB, a predictive database of RNA secondary structures that aims to form a middle ground between experimentally identified structures in PDB and predicted consensus secondary structures in Rfam. The database contains individual secondary structures predicted using a tool for template-based prediction of RNA secondary structure for the homologs of the RNA families with at least one homolog with experimentally solved structure. Experimentally identified structures are used as the structural templates and thus the prediction has higher reliability than de novo predictions in Rfam. The sequences are downloaded from public resources. So far rPredictorDB covers 7365 RNAs with their secondary structures. Plots of the secondary structures use the Traveler package for readable display of RNAs with long sequences and complex structures, such as ribosomal RNAs. The RNAs in the output of rPredictorDB are extensively annotated and can be viewed, browsed, searched and downloaded according to taxonomic, sequence and structure data. Additionally, structure of user-provided sequences can be predicted using the templates stored in rPredictorDB. [less ▲]

Detailed reference viewed: 88 (1 UL)
Full Text
Peer Reviewed
See detailMINERVA API and plugins: opening molecular network analysis and visualization to the community.
Hoksza, David UL; Gawron, Piotr UL; Ostaszewski, Marek UL et al

in Bioinformatics (2019)

SUMMARY: The complexity of molecular networks makes them difficult to navigate and interpret, creating a need for specialized software. MINERVA is a web platform for visualization, exploration and ... [more ▼]

SUMMARY: The complexity of molecular networks makes them difficult to navigate and interpret, creating a need for specialized software. MINERVA is a web platform for visualization, exploration and management of molecular networks. Here, we introduce an extension to MINERVA architecture that greatly facilitates the access and use of the stored molecular network data. It allows to incorporate such data in analytical pipelines via a programmatic access interface, and to extend the platform's visual exploration and analytics functionality via plugin architecture. This is possible for any molecular network hosted by the MINERVA platform encoded in well-recognized systems biology formats. To showcase the possibilities of the plugin architecture, we have developed several plugins extending the MINERVA core functionalities. In the article, we demonstrate the plugins for interactive tree traversal of molecular networks, for enrichment analysis and for mapping and visualization of known disease variants or known adverse drug reactions to molecules in the network. AVAILABILITY AND IMPLEMENTATION: Plugins developed and maintained by the MINERVA team are available under the AGPL v3 license at https://git-r3lab.uni.lu/minerva/plugins/. The MINERVA API and plugin documentation is available at https://minerva-web.lcsb.uni.lu. [less ▲]

Detailed reference viewed: 184 (6 UL)
Full Text
Peer Reviewed
See detailMolArt: a molecular structure annotation and visualization tool
Hoksza, David UL; Gawron, Piotr UL; Ostaszewski, Marek UL et al

in Bioinformatics (2018)

Summary MolArt fills the gap between sequence and structure visualization by providing a light-weight, interactive environment enabling exploration of sequence annotations in the context of available ... [more ▼]

Summary MolArt fills the gap between sequence and structure visualization by providing a light-weight, interactive environment enabling exploration of sequence annotations in the context of available experimental or predicted protein structures. Provided a UniProt ID, MolArt downloads and displays sequence annotations, sequence-structure mapping and relevant structures. The sequence and structure views are interlinked, enabling sequence annotations being color overlaid over the mapped structures, thus providing an enhanced understanding and interpretation of the available molecular data. Availability and implementation MolArt is released under the Apache 2 license and is available at https://github.com/davidhoksza/MolArt. The project web page https://davidhoksza.github.io/MolArt/ features examples and applications of the tool. [less ▲]

Detailed reference viewed: 170 (17 UL)