Abstract :
[en] Elucidating molecular consequences of amino-acid-altering missense variants at scale is challenging. In this work, we explored whether features derived from three-dimensional (3D) protein structures can characterize patient missense variants across different protein classes with similar molecular level activities. The identified disease-associated features can advance our understanding of how a single amino acid substitution can lead to the etiology of monogenic disorders. For 1,330 disease-associated genes (>80%, 1,077/1,330 implicated in Mendelian disorders), we collected missense variants from the general population (gnomAD database, N=164,915) and patients (ClinVar and HGMD databases, N=32,923). We in silico mapped the variant positions onto >14k human protein 3D structures. We annotated the protein positions
of variants with 40 structural, physiochemical, and functional features. We then grouped the genes into 24 protein classes based on their molecular functions and performed statistical association analyses with the features of population and patient variants. We identified 18 (out of 40) features that are associated with patient variants in general. Specifically, patient variants are less exposed to solvent (p<1.0e-100), enriched on b-sheets (p<2.37e-39), frequently mutate aromatic residues (p<1.0e-100), occur in ligand binding sites (p<1.0e-100) and are spatially close to phosphorylation sites (p<1.0e-100). We also observed differential protein-class-specific features. For three protein classes (signaling molecules, proteases and hydrolases), patient variants significantly perturb the disulfide bonds (p<1.0e-100). Only in immunity proteins, patient variants are enriched in flexible coils (p<1.65e-06). Kinases and cell junction proteins exhibit enrichment of patient variants around SUMOylation (p<1.0e-100) and methylation sites (p<9.29e-11), respectively. In summary, we studied shared and unique features associated with patient variants on protein structure across 24 protein classes, providing novel mechanistic insights. We generated an online resource that contains amino-acid-wise feature annotation-track for 1,330 genes, summarizes the patient-variant-associated features on residue level, and can guide variant interpretation.