![]() ; ; et al in Brain: a Journal of Neurology (2023), 146(2), 519-533 Neurodevelopmental disorders (NDDs), including severe pediatric epilepsy, autism, and intellectual disabilities are heterogeneous conditions in which clinical genetic testing can often identify a ... [more ▼] Neurodevelopmental disorders (NDDs), including severe pediatric epilepsy, autism, and intellectual disabilities are heterogeneous conditions in which clinical genetic testing can often identify a pathogenic variant. For many of them, genetic therapies will be tested in this or the coming years in clinical trials. In contrast to first-generation symptomatic treatments, the new disease-modifying precision medicines require a genetic test-informed diagnosis before a patient can be enrolled in a clinical trial. However, even in 2022, most identified genetic variants in NDD genes are ‘Variants of Uncertain Significance’. To safely enroll patients in precision medicine clinical trials, it is important to increase our knowledge about which regions in NDD-associated proteins can ‘tolerate’ missense variants and which ones are ‘essential’ and will cause a NDD when mutated. In addition, knowledge about functionally indispensable regions in the three-dimensional (3D) structure context of proteins can also provide insights into the molecular mechanisms of disease variants. We developed a novel consensus approach that overlays evolutionary, and population based genomic scores to identify 3D essential sites (Essential3D) on protein structures. After extensive benchmarking of AlphaFold predicted and experimentally solved protein structures, we generated the currently largest expert curated protein structure set for 242 NDDs and identified 14,377 Essential3D sites across 189 gene disorders associated proteins. We demonstrate that the consensus annotation of Essential3D sites improves prioritization of disease mutations over single annotations. The identified Essential3D sites were enriched for functional features such as intermembrane regions or active sites and discovered key inter-molecule interactions in protein complexes that were otherwise not annotated. Using the currently largest autism, developmental disorders, and epilepsies exome sequencing studies including >360,000 NDD patients and population controls, we found that missense variants at Essential3D sites are 8-fold enriched in patients. In summary, we developed a comprehensive protein structure set for 242 neurodevelopmental disorders and identified 14,377 Essential3D sites in these. All data are available at https://es-ndd.broadinstitute.org for interactive visual inspection to enhance variant interpretation and development of mechanistic hypotheses for 242 NDDs genes. The provided resources will enhance clinical variant interpretation and in silico drug target development for NDD-associated genes and encoded proteins. [less ▲] Detailed reference viewed: 26 (0 UL)![]() ; ; et al in Proceedings of the National Academy of Sciences of the United States of America (2020) Interpretation of the colossal number of genetic variants identified from sequencing applications is one of the major bottlenecks in clinical genetics, with the inference of the effect of amino acid ... [more ▼] Interpretation of the colossal number of genetic variants identified from sequencing applications is one of the major bottlenecks in clinical genetics, with the inference of the effect of amino acid-substituting missense variations on protein structure and function being especially challenging. Here we characterize the three-dimensional (3D) amino acid positions affected in pathogenic and population variants from 1,330 disease-associated genes using over 14,000 experimentally solved human protein structures. By measuring the statistical burden of variations (i.e., point mutations) from all genes on 40 3D protein features, accounting for the structural, chemical, and functional context of the variations’ positions, we identify features that are generally associated with pathogenic and population missense variants. We then perform the same amino acid-level analysis individually for 24 protein functional classes, which reveals unique characteristics of the positions of the altered amino acids: We observe up to 46% divergence of the class-specific features from the general characteristics obtained by the analysis on all genes, which is consistent with the structural diversity of essential regions across different protein classes. We demonstrate that the function-specific 3D features of the variants match the readouts of mutagenesis experiments for BRCA1 and PTEN, and positively correlate with an independent set of clinically interpreted pathogenic and benign missense variants. Finally, we make our results available through a web server to foster accessibility and downstream research. Our findings represent a crucial step toward translational genetics, from highlighting the impact of mutations on protein structure to rationalizing the variants’ pathogenicity in terms of the perturbed molecular mechanisms. [less ▲] Detailed reference viewed: 105 (1 UL)![]() ; ; et al in Science Translational Medicine (2020), 12(556), 6848 Malfunctions of voltage-gated sodium and calcium channels (encoded by SCNxA and CACNA1x family genes, respectively) have been associated with severe neurologic, psychiatric, cardiac, and other diseases ... [more ▼] Malfunctions of voltage-gated sodium and calcium channels (encoded by SCNxA and CACNA1x family genes, respectively) have been associated with severe neurologic, psychiatric, cardiac, and other diseases. Altered channel activity is frequently grouped into gain or loss of ion channel function (GOF or LOF, respectively) that often corresponds not only to clinical disease manifestations but also to differences in drug response. Experimental studies of channel function are therefore important, but laborious and usually focus only on a few variants at a time. On the basis of known gene-disease mechanisms of 19 different diseases, we inferred LOF (n = 518) and GOF (n = 309) likely pathogenic variants from the disease phenotypes of variant carriers. By training a machine learning model on sequence- and structure-based features, we predicted LOF or GOF effects [area under the receiver operating characteristics curve (ROC) = 0.85] of likely pathogenic missense variants. Our LOF versus GOF prediction corresponded to molecular LOF versus GOF effects for 87 functionally tested variants in SCN1/2/8A and CACNA1I (ROC = 0.73) and was validated in exome-wide data from 21,703 cases and 128,957 controls. We showed respective regional clustering of inferred LOF and GOF nucleotide variants across the alignment of the entire gene family, suggesting shared pathomechanisms in the SCNxA/CACNA1x family genes. [less ▲] Detailed reference viewed: 196 (7 UL)![]() ; Hoksza, David ![]() in Nucleic Acids Research (2020) Human genome sequencing efforts have greatly expanded, and a plethora of missense variants identified both in patients and in the general population is now publicly accessible. Interpretation of the ... [more ▼] Human genome sequencing efforts have greatly expanded, and a plethora of missense variants identified both in patients and in the general population is now publicly accessible. Interpretation of the molecular-level effect of missense variants, however, remains challenging and requires a particular investigation of amino acid substitutions in the context of protein structure and function. Answers to questions like ‘Is a variant perturbing a site involved in key macromolecular interactions and/or cellular signaling?’, or ‘Is a variant changing an amino acid located at the protein core or part of a cluster of known pathogenic mutations in 3D?’ are crucial. Motivated by these needs, we developed MISCAST (missense variant to protein structure analysis web suite; http://miscast.broadinstitute.org/). MISCAST is an interactive and user-friendly web server to visualize and analyze missense variants in protein sequence and structure space. Additionally, a comprehensive set of protein structural and functional features have been aggregated in MISCAST from multiple databases, and displayed on structures alongside the variants to provide users with the biological context of the variant location in an integrated platform. We further made the annotated data and protein structures readily downloadable from MISCAST to foster advanced offline analysis of missense variants by a wide biological community. [less ▲] Detailed reference viewed: 77 (2 UL)![]() ; May, Patrick ![]() in Genome Medicine (2020), 12(28), Background: Classifying pathogenicity of missense variants represents a major challenge in clinical practice during the diagnoses of rare and genetic heterogeneous neurodevelopmental disorders (NDDs ... [more ▼] Background: Classifying pathogenicity of missense variants represents a major challenge in clinical practice during the diagnoses of rare and genetic heterogeneous neurodevelopmental disorders (NDDs). While orthologous gene conservation is commonly employed in variant annotation, approximately 80% of known disease-associated genes belong to gene families. The use of gene family information for disease gene discovery and variant interpretation has not yet been investigated on genome-wide scale. We empirically evaluate whether paralog conserved or non-conserved sites in human gene families are important in NDDs. Methods: Gene family information was collected from Ensembl. Paralog conserved sites were defined based on paralog sequence alignments. 10,068 NDD patients and 2,078 controls were statistically evaluated for de novo variant burden in gene families. Results: We demonstrate that disease-associated missense variants are enriched at paralog conserved sites across all disease groups and inheritance models tested. We developed a gene family de novo enrichment framework that identified 43 exome-wide enriched gene families including 98 de novo variant carrying genes in NDD patients of which 28 represent novel candidate genes for NDD which are brain expressed and under evolutionary constraint. Conclusion: This study represents the first method to incorporate gene-family information into a statistical framework to interpret variant data for NDDs and to discover newly NDD -associated genes. [less ▲] Detailed reference viewed: 113 (2 UL)![]() ; May, Patrick ![]() in European Journal of Human Genetics (2019) It is challenging to estimate genetic variant burden across different subtypes of epilepsy. Herein, we used a comparative approach to assess the diagnostic yield and genotype-phenotype correlations in the ... [more ▼] It is challenging to estimate genetic variant burden across different subtypes of epilepsy. Herein, we used a comparative approach to assess the diagnostic yield and genotype-phenotype correlations in the four most common brain lesions in patients with drug-resistant focal epilepsy. Targeted sequencing analysis was performed for a panel of 161 genes with a mean coverage of > 400x. Lesional tissue was histopathologically reviewed and dissected from hippocampal sclerosis (n=15), ganglioglioma (n=16), dysembryoplastic neuroepithelial tumors (n=8) and ocal cortical dysplasia type II (n=15). Peripheral blood (n=12) or surgical tissue samples histopathologically classified as lesion-free (n=42) were available for comparison. Variants were classified as pathogenic or likely pathogenic according to American College of Medical Genetics and Genomics guidelines. Overall, we identified pathogenic and likely pathogenic variants in 25.9% of patients with a mean coverage of 383x. The highest number of pathogenic/ likely pathogenic variants was observed in patients with ganglioglioma (43.75%; all somatic) and dysembryoplastic neuroepithelial tumors (37.5%; all somatic), and in 20% of cases with focal cortical dysplasia type II (13.33% somatic, 6.67% germline). Pathogenic/likely pathogenic positive genes were disorder-specific and BRAF V600E the only recurrent pathogenic variant. This study represents a reference for diagnostic yield across the four most common lesion entities in patients with drug-resistant focal epilepsy. The observed large variability in variant burden by epileptic lesion type calls for whole exome sequencing of histopathologically well characterized tissue in a diagnostic setting and in research to discover novel disease-associated genes. [less ▲] Detailed reference viewed: 114 (2 UL)![]() ; ; et al in American Journal of Human Genetics (2019) Sequencing-based studies have identified novel risk genes associated with severe epilepsies and revealed an excess of rare deleterious variation in less-severe forms of epilepsy. To identify the shared ... [more ▼] Sequencing-based studies have identified novel risk genes associated with severe epilepsies and revealed an excess of rare deleterious variation in less-severe forms of epilepsy. To identify the shared and distinct ultra-rare genetic risk factors for different types of epilepsies, we performed a whole-exome sequencing (WES) analysis of 9,170 epilepsy-affected individuals and 8,436 controls of European ancestry. We focused on three phenotypic groups: severe developmental and epileptic encephalopathies (DEEs), genetic generalized epilepsy (GGE), and non-acquired focal epilepsy (NAFE). We observed that compared to controls, individuals with any type of epilepsy carried an excess of ultra-rare, deleterious variants in constrained genes and in genes previously associated with epilepsy; we saw the strongest enrichment in individuals with DEEs and the least strong in individuals with NAFE. Moreover, we found that inhibitory GABAA receptor genes were enriched for missense variants across all three classes of epilepsy, whereas no enrichment was seen in excitatory receptor genes. The larger gene groups for the GABAergic pathway or cation channels also showed a significant mutational burden in DEEs and GGE. Although no single gene surpassed exome-wide significance among individuals with GGE or NAFE, highly constrained genes and genes encoding ion channels were among the lead associations; such genes included CACNA1G, EEF1A2, and GABRG2 for GGE and LGI1, TRIM3, and GABRG2 for NAFE. Our study, the largest epilepsy WES study to date, confirms a convergence in the genetics of severe and less-severe epilepsies associated with ultra-rare coding variation, and it highlights a ubiquitous role for GABAergic inhibition in epilepsy etiology. [less ▲] Detailed reference viewed: 147 (7 UL)![]() ; ; et al E-print/Working paper (2019) Inference of the structural and functional consequences of amino acid-altering missense variants is challenging and not yet scalable. Clinical and research applications of the colossal number of ... [more ▼] Inference of the structural and functional consequences of amino acid-altering missense variants is challenging and not yet scalable. Clinical and research applications of the colossal number of identified missense variants is thus limited. Here we describe the aggregation and analysis of large-scale genomic variation and structural biology data for 1,330 disease-associated genes. Comparing the burden of 40 structural, physicochemical, and functional protein features of altered amino acids with 3-dimensional coordinates, we found 18 and 14 features that are associated with pathogenic and population missense variants, respectively. Separate analyses of variants from 24 protein functional classes revealed novel function-dependent vulnerable features. We then devised a quantitative spectrum, identifying variants with higher pathogenic variant-associated features. Finally, we developed a web resource (MISCAST; http://miscast.broadinstitute.org/) for interactive analysis of variants on linear and tertiary protein structures. The biological impact of missense variants available through the webtool will assist researchers in hypothesizing variant pathogenicity and disease trajectories. [less ▲] Detailed reference viewed: 256 (1 UL)![]() ; ; et al in Bioinformatics (2019) The correct classification of missense variants as benign or pathogenic remains challenging. Pathogenic variants are expected to have higher deleterious prediction scores than benign variants in the same ... [more ▼] The correct classification of missense variants as benign or pathogenic remains challenging. Pathogenic variants are expected to have higher deleterious prediction scores than benign variants in the same gene. However, most of the existing variant annotation tools do not reference the score range of benign population variants on gene level. Here, we present a web-application, Variant Score Ranker, which enables users to rapidly annotate variants and perform gene-specific variant score ranking on the population level. We also provide an intuitive example of how gene- and population-calibrated variant ranking scores can improve epilepsy variant prioritization. [less ▲] Detailed reference viewed: 89 (4 UL)![]() ; ; et al in Biophysical Journal (2019, February 15), 116(3), 420-421 Elucidating molecular consequences of amino-acid-altering missense variants at scale is challenging. In this work, we explored whether features derived from three-dimensional (3D) protein structures can ... [more ▼] Elucidating molecular consequences of amino-acid-altering missense variants at scale is challenging. In this work, we explored whether features derived from three-dimensional (3D) protein structures can characterize patient missense variants across different protein classes with similar molecular level activities. The identified disease-associated features can advance our understanding of how a single amino acid substitution can lead to the etiology of monogenic disorders. For 1,330 disease-associated genes (>80%, 1,077/1,330 implicated in Mendelian disorders), we collected missense variants from the general population (gnomAD database, N=164,915) and patients (ClinVar and HGMD databases, N=32,923). We in silico mapped the variant positions onto >14k human protein 3D structures. We annotated the protein positions of variants with 40 structural, physiochemical, and functional features. We then grouped the genes into 24 protein classes based on their molecular functions and performed statistical association analyses with the features of population and patient variants. We identified 18 (out of 40) features that are associated with patient variants in general. Specifically, patient variants are less exposed to solvent (p<1.0e-100), enriched on b-sheets (p<2.37e-39), frequently mutate aromatic residues (p<1.0e-100), occur in ligand binding sites (p<1.0e-100) and are spatially close to phosphorylation sites (p<1.0e-100). We also observed differential protein-class-specific features. For three protein classes (signaling molecules, proteases and hydrolases), patient variants significantly perturb the disulfide bonds (p<1.0e-100). Only in immunity proteins, patient variants are enriched in flexible coils (p<1.65e-06). Kinases and cell junction proteins exhibit enrichment of patient variants around SUMOylation (p<1.0e-100) and methylation sites (p<9.29e-11), respectively. In summary, we studied shared and unique features associated with patient variants on protein structure across 24 protein classes, providing novel mechanistic insights. We generated an online resource that contains amino-acid-wise feature annotation-track for 1,330 genes, summarizes the patient-variant-associated features on residue level, and can guide variant interpretation. [less ▲] Detailed reference viewed: 151 (1 UL)![]() ; ; et al E-print/Working paper (2019) Malfunctions of voltage-gated sodium and calcium channels (SCN and CACNA1 genes) have been associated with severe neurologic, psychiatric, cardiac and other diseases. Altered channel activity is ... [more ▼] Malfunctions of voltage-gated sodium and calcium channels (SCN and CACNA1 genes) have been associated with severe neurologic, psychiatric, cardiac and other diseases. Altered channel activity is frequently grouped into gain or loss of ion channel function (GOF or LOF, respectively) which is not only corresponding to clinical disease manifestations, but also to differences in drug response. Experimental studies of channel function are therefore important, but laborious and usually focus only on a few variants at a time. Based on known gene-disease-mechanisms, we here infer LOF (518 variants) and GOF (309 variants) of likely pathogenic variants from disease phenotypes of variant carriers. We show regional clustering of inferred GOF and LOF variants, respectively, across the alignment of the entire gene family, suggesting shared pathomechanisms in the SCN/CACNA1 genes. By training a machine learning model on sequence- and structure-based features we predict LOF- or GOF- associated disease phenotypes (ROC = 0.85) of likely pathogenic missense variants. We then successfully validate the GOF versus LOF prediction on 87 functionally tested variants in SCN1/2/8A and CACNA1I (ROC = 0.73) and in exome-wide data from > 100.000 cases and controls. Ultimately, functional prediction of missense variants in clinically relevant genes will facilitate precision medicine in clinical practice. [less ▲] Detailed reference viewed: 201 (0 UL)![]() ; ; et al in Epilepsia (2018) Objective: Increasing availability of surgically resected brain tissue from patients with focal epilepsy and Focal Cortical Dysplasia (FCD) or low-grade glio-neuronal tumors has fostered large-scale ... [more ▼] Objective: Increasing availability of surgically resected brain tissue from patients with focal epilepsy and Focal Cortical Dysplasia (FCD) or low-grade glio-neuronal tumors has fostered large-scale genetic examination. However, assessment of pathogenicity of germline and somatic variants remains difficult. Here, we present a state of the art evaluation of reported genes and variants associated with epileptic brain lesions. Methods: We critically re-evaluated the pathogenicity for all neuropathology-associated variants reported to date in PubMed and ClinVar databases including 101 neuropathology-associated missense variants encompassing 11 disease-related genes. We assessed gene variant tolerance and classified all identified missense variants according to guidelines from the American College of Medical Genetics and Genomics (ACMG). We further extended the bioinformatic variant prediction by introducing a novel gene-specific deleteriousness ranking for prediction scores. Results: Application of ACMG guidelines and in silico gene variant tolerance analysis classified only seven out of 11 genes to be likely disease-associated according to the reported a disease mechanism, while 61 (60.4%) of 101 variants of those genes were classified as of uncertain significance (VUS), 37 (36.6%) as being likely pathogenic (LP) and 3 (3%) as being pathogenic (P). Significance: We concluded that the majority of neuropathology-associated variants reported to date do not have enough evidence to be classified as pathogenic. Interpretation of lesion-associated variants remains challenging and application of current ACMG guidelines is recommended for interpretation and prediction. [less ▲] Detailed reference viewed: 149 (4 UL)![]() ; ; et al E-print/Working paper (2018) Neurodevelopmental disorders (NDD) with epilepsy constitute a complex and heterogeneous phenotypic spectrum of largely unclear genetic architecture. We conducted exome-wide enrichment analyses for protein ... [more ▼] Neurodevelopmental disorders (NDD) with epilepsy constitute a complex and heterogeneous phenotypic spectrum of largely unclear genetic architecture. We conducted exome-wide enrichment analyses for protein-altering de novo variants (DNV) in 7088 parent-offspring trios with NDD of which 2151 were comorbid with epilepsy. In this cohort, the genetic spectrum of epileptic encephalopathy (EE) and nonspecific NDD with epilepsy were markedly similar. We identified 33 genes significantly enriched for DNV in NDD with epilepsy, of which 27.3 were associated with therapeutic consequences. These 33 DNV-enriched genes were more often associated with synaptic transmission but less with chromatin modification when compared to NDD without epilepsy. On average, only 53 of the DNV-enriched genes were represented on available diagnostic sequencing panels, so our findings should drive significant improvements of genetic testing approaches. [less ▲] Detailed reference viewed: 306 (4 UL) |
||