![]() ; ; et al in Brain: a Journal of Neurology (2023), 146(2), 519-533 Neurodevelopmental disorders (NDDs), including severe pediatric epilepsy, autism, and intellectual disabilities are heterogeneous conditions in which clinical genetic testing can often identify a ... [more ▼] Neurodevelopmental disorders (NDDs), including severe pediatric epilepsy, autism, and intellectual disabilities are heterogeneous conditions in which clinical genetic testing can often identify a pathogenic variant. For many of them, genetic therapies will be tested in this or the coming years in clinical trials. In contrast to first-generation symptomatic treatments, the new disease-modifying precision medicines require a genetic test-informed diagnosis before a patient can be enrolled in a clinical trial. However, even in 2022, most identified genetic variants in NDD genes are ‘Variants of Uncertain Significance’. To safely enroll patients in precision medicine clinical trials, it is important to increase our knowledge about which regions in NDD-associated proteins can ‘tolerate’ missense variants and which ones are ‘essential’ and will cause a NDD when mutated. In addition, knowledge about functionally indispensable regions in the three-dimensional (3D) structure context of proteins can also provide insights into the molecular mechanisms of disease variants. We developed a novel consensus approach that overlays evolutionary, and population based genomic scores to identify 3D essential sites (Essential3D) on protein structures. After extensive benchmarking of AlphaFold predicted and experimentally solved protein structures, we generated the currently largest expert curated protein structure set for 242 NDDs and identified 14,377 Essential3D sites across 189 gene disorders associated proteins. We demonstrate that the consensus annotation of Essential3D sites improves prioritization of disease mutations over single annotations. The identified Essential3D sites were enriched for functional features such as intermembrane regions or active sites and discovered key inter-molecule interactions in protein complexes that were otherwise not annotated. Using the currently largest autism, developmental disorders, and epilepsies exome sequencing studies including >360,000 NDD patients and population controls, we found that missense variants at Essential3D sites are 8-fold enriched in patients. In summary, we developed a comprehensive protein structure set for 242 neurodevelopmental disorders and identified 14,377 Essential3D sites in these. All data are available at https://es-ndd.broadinstitute.org for interactive visual inspection to enhance variant interpretation and development of mechanistic hypotheses for 242 NDDs genes. The provided resources will enhance clinical variant interpretation and in silico drug target development for NDD-associated genes and encoded proteins. [less ▲] Detailed reference viewed: 26 (0 UL)![]() ; ; et al in Science Translational Medicine (2020), 12(556), 6848 Malfunctions of voltage-gated sodium and calcium channels (encoded by SCNxA and CACNA1x family genes, respectively) have been associated with severe neurologic, psychiatric, cardiac, and other diseases ... [more ▼] Malfunctions of voltage-gated sodium and calcium channels (encoded by SCNxA and CACNA1x family genes, respectively) have been associated with severe neurologic, psychiatric, cardiac, and other diseases. Altered channel activity is frequently grouped into gain or loss of ion channel function (GOF or LOF, respectively) that often corresponds not only to clinical disease manifestations but also to differences in drug response. Experimental studies of channel function are therefore important, but laborious and usually focus only on a few variants at a time. On the basis of known gene-disease mechanisms of 19 different diseases, we inferred LOF (n = 518) and GOF (n = 309) likely pathogenic variants from the disease phenotypes of variant carriers. By training a machine learning model on sequence- and structure-based features, we predicted LOF or GOF effects [area under the receiver operating characteristics curve (ROC) = 0.85] of likely pathogenic missense variants. Our LOF versus GOF prediction corresponded to molecular LOF versus GOF effects for 87 functionally tested variants in SCN1/2/8A and CACNA1I (ROC = 0.73) and was validated in exome-wide data from 21,703 cases and 128,957 controls. We showed respective regional clustering of inferred LOF and GOF nucleotide variants across the alignment of the entire gene family, suggesting shared pathomechanisms in the SCNxA/CACNA1x family genes. [less ▲] Detailed reference viewed: 196 (7 UL)![]() ; Hoksza, David ![]() in Nucleic Acids Research (2020) Human genome sequencing efforts have greatly expanded, and a plethora of missense variants identified both in patients and in the general population is now publicly accessible. Interpretation of the ... [more ▼] Human genome sequencing efforts have greatly expanded, and a plethora of missense variants identified both in patients and in the general population is now publicly accessible. Interpretation of the molecular-level effect of missense variants, however, remains challenging and requires a particular investigation of amino acid substitutions in the context of protein structure and function. Answers to questions like ‘Is a variant perturbing a site involved in key macromolecular interactions and/or cellular signaling?’, or ‘Is a variant changing an amino acid located at the protein core or part of a cluster of known pathogenic mutations in 3D?’ are crucial. Motivated by these needs, we developed MISCAST (missense variant to protein structure analysis web suite; http://miscast.broadinstitute.org/). MISCAST is an interactive and user-friendly web server to visualize and analyze missense variants in protein sequence and structure space. Additionally, a comprehensive set of protein structural and functional features have been aggregated in MISCAST from multiple databases, and displayed on structures alongside the variants to provide users with the biological context of the variant location in an integrated platform. We further made the annotated data and protein structures readily downloadable from MISCAST to foster advanced offline analysis of missense variants by a wide biological community. [less ▲] Detailed reference viewed: 77 (2 UL)![]() ; ; et al E-print/Working paper (2019) Malfunctions of voltage-gated sodium and calcium channels (SCN and CACNA1 genes) have been associated with severe neurologic, psychiatric, cardiac and other diseases. Altered channel activity is ... [more ▼] Malfunctions of voltage-gated sodium and calcium channels (SCN and CACNA1 genes) have been associated with severe neurologic, psychiatric, cardiac and other diseases. Altered channel activity is frequently grouped into gain or loss of ion channel function (GOF or LOF, respectively) which is not only corresponding to clinical disease manifestations, but also to differences in drug response. Experimental studies of channel function are therefore important, but laborious and usually focus only on a few variants at a time. Based on known gene-disease-mechanisms, we here infer LOF (518 variants) and GOF (309 variants) of likely pathogenic variants from disease phenotypes of variant carriers. We show regional clustering of inferred GOF and LOF variants, respectively, across the alignment of the entire gene family, suggesting shared pathomechanisms in the SCN/CACNA1 genes. By training a machine learning model on sequence- and structure-based features we predict LOF- or GOF- associated disease phenotypes (ROC = 0.85) of likely pathogenic missense variants. We then successfully validate the GOF versus LOF prediction on 87 functionally tested variants in SCN1/2/8A and CACNA1I (ROC = 0.73) and in exome-wide data from > 100.000 cases and controls. Ultimately, functional prediction of missense variants in clinically relevant genes will facilitate precision medicine in clinical practice. [less ▲] Detailed reference viewed: 201 (0 UL)![]() ; ; et al in Biophysical Journal (2018, February 02), 114(3, Suppl. 1), 664 The functional interpretation of genetic variation in disease-associated genes is far outpaced by data generation. Existing algorithms for prediction of variant consequences do not adequately distinguish ... [more ▼] The functional interpretation of genetic variation in disease-associated genes is far outpaced by data generation. Existing algorithms for prediction of variant consequences do not adequately distinguish pathogenic variants from benign rare variants. This lack of statistical and bioinformatics analyses, accompanied by an ever-increasing number of identified variants in biomedical research and clinical applications, has become a major challenge. Established methods to predict the functional effect of genetic variation use the degree of amino acid conservation across species in linear protein sequence alignment. More recent methods include the spatial distribution pattern of known patient and control variants. Here, we propose to combine the linear conservation and spatial constrained based scores to devise a novel score that incorporates 3-dimensional structural properties of amino acid residues, such as the solvent-accessible surface area, degree of flexibility, secondary structure propensity and binding tendency, to quantify the effect of amino acid substitutions. For this study, we develop a framework for large-scale mapping of established linear sequence-based paralog and ortholog conservation scores onto the tertiary structures of human proteins. This framework can be utilized to map the spatial distribution of mutations on solved protein structures as well as homology models. As a proof of concept, using a homology model of the human Nav1.2 voltage-gated sodium channel structure, we observe spatial clustering in distinct domains of mutations, associated with Autism Spectrum Disorder (>20 variants) and Epilepsy (>100 variants), that exert opposing effects on channel function. We are currently characterizing all variants (>300k individuals) found in ClinVar, the largest disease variant database, as well as variants identified in >140k individuals from general population. The variant mapping framework and our score, informed with structural information, will be useful in identifying structural motifs of proteins associated with disease risk. [less ▲] Detailed reference viewed: 133 (2 UL) |
||