References of "Hoksza, David 50026933"
     in
Bookmark and Share    
Full Text
See detailInsights into protein structural, physicochemical, and functional consequences of missense variants in 1,330 disease-associated human genes 693259
Iqbal, Sumaiya; Jespersen, Jakob B.; Perez-Palma, Eduardo et al

E-print/Working paper (2019)

Inference of the structural and functional consequences of amino acid-altering missense variants is challenging and not yet scalable. Clinical and research applications of the colossal number of ... [more ▼]

Inference of the structural and functional consequences of amino acid-altering missense variants is challenging and not yet scalable. Clinical and research applications of the colossal number of identified missense variants is thus limited. Here we describe the aggregation and analysis of large-scale genomic variation and structural biology data for 1,330 disease-associated genes. Comparing the burden of 40 structural, physicochemical, and functional protein features of altered amino acids with 3-dimensional coordinates, we found 18 and 14 features that are associated with pathogenic and population missense variants, respectively. Separate analyses of variants from 24 protein functional classes revealed novel function-dependent vulnerable features. We then devised a quantitative spectrum, identifying variants with higher pathogenic variant-associated features. Finally, we developed a web resource (MISCAST; http://miscast.broadinstitute.org/) for interactive analysis of variants on linear and tertiary protein structures. The biological impact of missense variants available through the webtool will assist researchers in hypothesizing variant pathogenicity and disease trajectories. [less ▲]

Detailed reference viewed: 129 (0 UL)
Full Text
Peer Reviewed
See detailPDBe-KB: a community-driven resource for structural and functional annotations
Varadi, M.; Berrisford, J.; Deshpande, M. et al

in Nucleic Acids Res. (2019)

The Protein Data Bank in Europe-Knowledge Base (PDBe-KB, https://pdbe-kb.org) is a community-driven, collaborative resource for literature-derived, manually curated and computationally predicted ... [more ▼]

The Protein Data Bank in Europe-Knowledge Base (PDBe-KB, https://pdbe-kb.org) is a community-driven, collaborative resource for literature-derived, manually curated and computationally predicted structural and functional annotations of macromolecular structure data, contained in the Protein Data Bank (PDB). The goal of PDBe-KB is two-fold: (i) to increase the visibility and reduce the fragmentation of annotations contributed by specialist data resources, and to make these data more findable, accessible, interoperable and reusable (FAIR) and (ii) to place macromolecular structure data in their biological context, thus facilitating their use by the broader scientific community in fundamental and applied research. Here, we describe the guidelines of this collaborative effort, the current status of contributed data, and the PDBe-KB infrastructure, which includes the data exchange format, the deposition system for added value annotations, the distributable database containing the assembled data, and programmatic access endpoints. We also describe a series of novel web-pages—the PDBe-KB aggregated views of structure data—which combine information on macromolecular structures from many PDB entries. We have recently released the first set of pages in this series, which provide an overview of available structural and functional information for a protein of interest, referenced by a UniProtKB accession. [less ▲]

Detailed reference viewed: 22 (0 UL)
Full Text
Peer Reviewed
See detailrPredictorDB: a predictive database of individual secondary structures of RNAs and their formatted plots.
Jelinek, Jan; Hoksza, David UL; Hajic, Jan et al

in Database : the journal of biological databases and curation (2019), 2019

Secondary data structure of RNA molecules provides insights into the identity and function of RNAs. With RNAs readily sequenced, the question of their structural characterization is increasingly important ... [more ▼]

Secondary data structure of RNA molecules provides insights into the identity and function of RNAs. With RNAs readily sequenced, the question of their structural characterization is increasingly important. However, RNA structure is difficult to acquire. Its experimental identification is extremely technically demanding, while computational prediction is not accurate enough, especially for large structures of long sequences. We address this difficult situation with rPredictorDB, a predictive database of RNA secondary structures that aims to form a middle ground between experimentally identified structures in PDB and predicted consensus secondary structures in Rfam. The database contains individual secondary structures predicted using a tool for template-based prediction of RNA secondary structure for the homologs of the RNA families with at least one homolog with experimentally solved structure. Experimentally identified structures are used as the structural templates and thus the prediction has higher reliability than de novo predictions in Rfam. The sequences are downloaded from public resources. So far rPredictorDB covers 7365 RNAs with their secondary structures. Plots of the secondary structures use the Traveler package for readable display of RNAs with long sequences and complex structures, such as ribosomal RNAs. The RNAs in the output of rPredictorDB are extensively annotated and can be viewed, browsed, searched and downloaded according to taxonomic, sequence and structure data. Additionally, structure of user-provided sequences can be predicted using the templates stored in rPredictorDB. [less ▲]

Detailed reference viewed: 20 (0 UL)
Full Text
Peer Reviewed
See detailMachine Learning to Support the Presentation of Complex Pathway Graphs.
Nielsen, Sune, S UL; Ostaszewski, Marek UL; McGee, Fintan et al

in IEEE/ACM transactions on computational biology and bioinformatics (2019)

Visualization of biological mechanisms by means of pathway graphs is necessary to better understand the often complex underlying system. Manual layout of such pathways or maps of knowledge is a difficult ... [more ▼]

Visualization of biological mechanisms by means of pathway graphs is necessary to better understand the often complex underlying system. Manual layout of such pathways or maps of knowledge is a difficult and time consuming process. Node duplication is a technique that makes layouts with improved readability possible by reducing edge crossings and shortening edge lengths in drawn diagrams. In this article we propose an approach using Machine Learning (ML) to facilitate parts of this task by training a Support Vector Machine (SVM) with actions taken during manual biocuration. Our training input is a series of incremental snapshots of a diagram describing mechanisms of a disease, progressively curated by a human expert employing node duplication in the process. As a test of the trained SVM models, they are applied to a single large instance and 25 medium-sized instances of hand-curated biological pathways. Finally, in a user validation study, we compare the model predictions to the outcome of a node duplication questionnaire answered by users of biological pathways with varying experience. We successfully predicted nodes for duplication and emulated human choices, demonstrating that our approach can effectively learn human-like node duplication preferences to support curation of pathway diagrams in various contexts. [less ▲]

Detailed reference viewed: 24 (2 UL)
Full Text
Peer Reviewed
See detailPrankWeb: a web server for ligand binding site prediction and visualization.
Jendele, Lukas; Krivak, Radoslav; Skoda, Petr et al

in Nucleic acids research (2019), 47(W1), 345-349

PrankWeb is an online resource providing an interface to P2Rank, a state-of-the-art method for ligand binding site prediction. P2Rank is a template-free machine learning method based on the prediction of ... [more ▼]

PrankWeb is an online resource providing an interface to P2Rank, a state-of-the-art method for ligand binding site prediction. P2Rank is a template-free machine learning method based on the prediction of local chemical neighborhood ligandability centered on points placed on a solvent-accessible protein surface. Points with a high ligandability score are then clustered to form the resulting ligand binding sites. In addition, PrankWeb provides a web interface enabling users to easily carry out the prediction and visually inspect the predicted binding sites via an integrated sequence-structure view. Moreover, PrankWeb can determine sequence conservation for the input molecule and use this in both the prediction and result visualization steps. Alongside its online visualization options, PrankWeb also offers the possibility of exporting the results as a PyMOL script for offline visualization. The web frontend communicates with the server side via a REST API. In high-throughput scenarios, therefore, users can utilize the server API directly, bypassing the need for a web-based frontend or installation of the P2Rank application. PrankWeb is available at http://prankweb.cz/, while the web application source code and the P2Rank method can be accessed at https://github.com/jendelel/PrankWebApp and https://github.com/rdk/p2rank, respectively. [less ▲]

Detailed reference viewed: 11 (0 UL)
Full Text
Peer Reviewed
See detailMolArt: a molecular structure annotation and visualization tool
Hoksza, David UL; Gawron, Piotr UL; Ostaszewski, Marek UL et al

in Bioinformatics (2018)

Summary MolArt fills the gap between sequence and structure visualization by providing a light-weight, interactive environment enabling exploration of sequence annotations in the context of available ... [more ▼]

Summary MolArt fills the gap between sequence and structure visualization by providing a light-weight, interactive environment enabling exploration of sequence annotations in the context of available experimental or predicted protein structures. Provided a UniProt ID, MolArt downloads and displays sequence annotations, sequence-structure mapping and relevant structures. The sequence and structure views are interlinked, enabling sequence annotations being color overlaid over the mapped structures, thus providing an enhanced understanding and interpretation of the available molecular data. Availability and implementation MolArt is released under the Apache 2 license and is available at https://github.com/davidhoksza/MolArt. The project web page https://davidhoksza.github.io/MolArt/ features examples and applications of the tool. [less ▲]

Detailed reference viewed: 60 (12 UL)