![]() ; ; et al in Proceedings of the National Academy of Sciences of the United States of America (2020) Interpretation of the colossal number of genetic variants identified from sequencing applications is one of the major bottlenecks in clinical genetics, with the inference of the effect of amino acid ... [more ▼] Interpretation of the colossal number of genetic variants identified from sequencing applications is one of the major bottlenecks in clinical genetics, with the inference of the effect of amino acid-substituting missense variations on protein structure and function being especially challenging. Here we characterize the three-dimensional (3D) amino acid positions affected in pathogenic and population variants from 1,330 disease-associated genes using over 14,000 experimentally solved human protein structures. By measuring the statistical burden of variations (i.e., point mutations) from all genes on 40 3D protein features, accounting for the structural, chemical, and functional context of the variations’ positions, we identify features that are generally associated with pathogenic and population missense variants. We then perform the same amino acid-level analysis individually for 24 protein functional classes, which reveals unique characteristics of the positions of the altered amino acids: We observe up to 46% divergence of the class-specific features from the general characteristics obtained by the analysis on all genes, which is consistent with the structural diversity of essential regions across different protein classes. We demonstrate that the function-specific 3D features of the variants match the readouts of mutagenesis experiments for BRCA1 and PTEN, and positively correlate with an independent set of clinically interpreted pathogenic and benign missense variants. Finally, we make our results available through a web server to foster accessibility and downstream research. Our findings represent a crucial step toward translational genetics, from highlighting the impact of mutations on protein structure to rationalizing the variants’ pathogenicity in terms of the perturbed molecular mechanisms. [less ▲] Detailed reference viewed: 105 (1 UL)![]() ; Hoksza, David ![]() in Nucleic Acids Research (2020) Human genome sequencing efforts have greatly expanded, and a plethora of missense variants identified both in patients and in the general population is now publicly accessible. Interpretation of the ... [more ▼] Human genome sequencing efforts have greatly expanded, and a plethora of missense variants identified both in patients and in the general population is now publicly accessible. Interpretation of the molecular-level effect of missense variants, however, remains challenging and requires a particular investigation of amino acid substitutions in the context of protein structure and function. Answers to questions like ‘Is a variant perturbing a site involved in key macromolecular interactions and/or cellular signaling?’, or ‘Is a variant changing an amino acid located at the protein core or part of a cluster of known pathogenic mutations in 3D?’ are crucial. Motivated by these needs, we developed MISCAST (missense variant to protein structure analysis web suite; http://miscast.broadinstitute.org/). MISCAST is an interactive and user-friendly web server to visualize and analyze missense variants in protein sequence and structure space. Additionally, a comprehensive set of protein structural and functional features have been aggregated in MISCAST from multiple databases, and displayed on structures alongside the variants to provide users with the biological context of the variant location in an integrated platform. We further made the annotated data and protein structures readily downloadable from MISCAST to foster advanced offline analysis of missense variants by a wide biological community. [less ▲] Detailed reference viewed: 77 (2 UL)![]() ; ; et al E-print/Working paper (2019) Inference of the structural and functional consequences of amino acid-altering missense variants is challenging and not yet scalable. Clinical and research applications of the colossal number of ... [more ▼] Inference of the structural and functional consequences of amino acid-altering missense variants is challenging and not yet scalable. Clinical and research applications of the colossal number of identified missense variants is thus limited. Here we describe the aggregation and analysis of large-scale genomic variation and structural biology data for 1,330 disease-associated genes. Comparing the burden of 40 structural, physicochemical, and functional protein features of altered amino acids with 3-dimensional coordinates, we found 18 and 14 features that are associated with pathogenic and population missense variants, respectively. Separate analyses of variants from 24 protein functional classes revealed novel function-dependent vulnerable features. We then devised a quantitative spectrum, identifying variants with higher pathogenic variant-associated features. Finally, we developed a web resource (MISCAST; http://miscast.broadinstitute.org/) for interactive analysis of variants on linear and tertiary protein structures. The biological impact of missense variants available through the webtool will assist researchers in hypothesizing variant pathogenicity and disease trajectories. [less ▲] Detailed reference viewed: 256 (1 UL)![]() ; ; et al in Biophysical Journal (2019, February 15), 116(3), 420-421 Elucidating molecular consequences of amino-acid-altering missense variants at scale is challenging. In this work, we explored whether features derived from three-dimensional (3D) protein structures can ... [more ▼] Elucidating molecular consequences of amino-acid-altering missense variants at scale is challenging. In this work, we explored whether features derived from three-dimensional (3D) protein structures can characterize patient missense variants across different protein classes with similar molecular level activities. The identified disease-associated features can advance our understanding of how a single amino acid substitution can lead to the etiology of monogenic disorders. For 1,330 disease-associated genes (>80%, 1,077/1,330 implicated in Mendelian disorders), we collected missense variants from the general population (gnomAD database, N=164,915) and patients (ClinVar and HGMD databases, N=32,923). We in silico mapped the variant positions onto >14k human protein 3D structures. We annotated the protein positions of variants with 40 structural, physiochemical, and functional features. We then grouped the genes into 24 protein classes based on their molecular functions and performed statistical association analyses with the features of population and patient variants. We identified 18 (out of 40) features that are associated with patient variants in general. Specifically, patient variants are less exposed to solvent (p<1.0e-100), enriched on b-sheets (p<2.37e-39), frequently mutate aromatic residues (p<1.0e-100), occur in ligand binding sites (p<1.0e-100) and are spatially close to phosphorylation sites (p<1.0e-100). We also observed differential protein-class-specific features. For three protein classes (signaling molecules, proteases and hydrolases), patient variants significantly perturb the disulfide bonds (p<1.0e-100). Only in immunity proteins, patient variants are enriched in flexible coils (p<1.65e-06). Kinases and cell junction proteins exhibit enrichment of patient variants around SUMOylation (p<1.0e-100) and methylation sites (p<9.29e-11), respectively. In summary, we studied shared and unique features associated with patient variants on protein structure across 24 protein classes, providing novel mechanistic insights. We generated an online resource that contains amino-acid-wise feature annotation-track for 1,330 genes, summarizes the patient-variant-associated features on residue level, and can guide variant interpretation. [less ▲] Detailed reference viewed: 151 (1 UL) |
||