[en] The Solve-RD project brings together clinicians, scientists, and patient representatives from 51 institutes spanning 15 countries to collaborate on genetically diagnosing ("solving") rare diseases (RDs). The project aims to significantly increase the diagnostic success rate by co-analyzing data from thousands of RD cases, including phenotypes, pedigrees, exome/genome sequencing, and multiomics data. Here we report on the data infrastructure devised and created to support this co-analysis. This infrastructure enables users to store, find, connect, and analyze data and metadata in a collaborative manner. Pseudonymized phenotypic and raw experimental data are submitted to the RD-Connect Genome-Phenome Analysis Platform and processed through standardized pipelines. Resulting files and novel produced omics data are sent to the European Genome-Phenome Archive, which adds unique file identifiers and provides long-term storage and controlled access services. MOLGENIS "RD3" and Café Variome "Discovery Nexus" connect data and metadata and offer discovery services, and secure cloud-based "Sandboxes" support multiparty data analysis. This successfully deployed and useful infrastructure design provides a blueprint for other projects that need to analyze large amounts of heterogeneous data.
Disciplines :
Life sciences: Multidisciplinary, general & others
Author, co-author :
Johansson, Lennart F ; Department of Genetics, University of Groningen, University Medical Center Groningen, HPC CB50, P.O. Box 30001, Groningen, 9700 RB, The Netherlands
Laurie, Steve ; Centro Nacional de Análisis Genómico, C/Baldiri Reixac 4, 08028, Barcelona, Spain ; Universitat de Barcelona (UB), Gran Via de les Corts Catalanes, 585, L'Eixample, 08007, Barcelona, Spain
Spalding, Dylan ; European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, CV10 1SD, UK
Gibson, Spencer ; Department of Genetics, Genomics and Cancer Sciences, University of Leicester, University Road, Leicester, Leicester, LE1 7RH, UK
Ruvolo, David ; Department of Genetics, University of Groningen, University Medical Center Groningen, HPC CB50, P.O. Box 30001, Groningen, 9700 RB, The Netherlands
Thomas, Coline ; European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, CV10 1SD, UK
Piscia, Davide ; Centro Nacional de Análisis Genómico, C/Baldiri Reixac 4, 08028, Barcelona, Spain ; Universitat de Barcelona (UB), Gran Via de les Corts Catalanes, 585, L'Eixample, 08007, Barcelona, Spain
de Andrade, Fernanda ; Department of Genetics, University of Groningen, University Medical Center Groningen, HPC CB50, P.O. Box 30001, Groningen, 9700 RB, The Netherlands
Been, Gerieke ; Department of Genetics, University of Groningen, University Medical Center Groningen, HPC CB50, P.O. Box 30001, Groningen, 9700 RB, The Netherlands
Bijlsma, Marieke; Department of Genetics, University of Groningen, University Medical Center Groningen, HPC CB50, P.O. Box 30001, Groningen, 9700 RB, The Netherlands
Brunner, Han ; Department of Human Genetics, Radboud University Medical Center, Geert Grooteplein Zuid 10, Nijmegen, 6525 GA, The Netherlands ; Donders Institute for Brain, Cognition and Behaviour, Radboud University Medical Center, P.O.Box 9103, Nijmegen, 6500 HD, The Netherlands ; Department of Clinical Genetics, Maastricht University Medical Centre, P. Debyelaan 25, Maastricht, 6229 HX, The Netherlands
Cimerman, Sandi ; Department of Genetics, University of Groningen, University Medical Center Groningen, HPC CB50, P.O. Box 30001, Groningen, 9700 RB, The Netherlands
Dizjikan, Farid Yavari; Department of Genetics, Genomics and Cancer Sciences, University of Leicester, University Road, Leicester, Leicester, LE1 7RH, UK
Ellwanger, Kornelia ; Institute of Medical Genetics and Applied Genomics, University of Tübingen, Calwerstraße 7, Tübingen 72076, Germany
Fernandez, Marcos ; Centro Nacional de Análisis Genómico, C/Baldiri Reixac 4, 08028, Barcelona, Spain ; Universitat de Barcelona (UB), Gran Via de les Corts Catalanes, 585, L'Eixample, 08007, Barcelona, Spain
Freeberg, Mallory ; European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, CV10 1SD, UK
van de Geijn, Gert-Jan ; Department of Genetics, University of Groningen, University Medical Center Groningen, HPC CB50, P.O. Box 30001, Groningen, 9700 RB, The Netherlands
Kanninga, Roan; Department of Genetics, University of Groningen, University Medical Center Groningen, HPC CB50, P.O. Box 30001, Groningen, 9700 RB, The Netherlands
Maddi, Vatsalya ; Department of Genetics, Genomics and Cancer Sciences, University of Leicester, University Road, Leicester, Leicester, LE1 7RH, UK
Mehtarizadeh, Mehdi; Department of Genetics, Genomics and Cancer Sciences, University of Leicester, University Road, Leicester, Leicester, LE1 7RH, UK
Neerincx, Pieter ; Department of Genetics, University of Groningen, University Medical Center Groningen, HPC CB50, P.O. Box 30001, Groningen, 9700 RB, The Netherlands
Ossowski, Stephan ; Institute of Medical Genetics and Applied Genomics, University of Tübingen, Calwerstraße 7, Tübingen 72076, Germany ; Institute for Bioinformatics and Medical Informatics (IBMI), University of Tübingen, Geschwister-Scholl-Platz, Tübingen 72074, Germany
Rath, Ana ; INSERM, US-14 Orphanet, 96 rue Didot, Paris 75014, France
Roelofs-Prins, Dieuwke; Department of Genetics, University of Groningen, University Medical Center Groningen, HPC CB50, P.O. Box 30001, Groningen, 9700 RB, The Netherlands
Stok-Benjamins, Marloes ; Department of Genetics, University of Groningen, University Medical Center Groningen, HPC CB50, P.O. Box 30001, Groningen, 9700 RB, The Netherlands
van der Velde, K Joeri ; Department of Genetics, University of Groningen, University Medical Center Groningen, HPC CB50, P.O. Box 30001, Groningen, 9700 RB, The Netherlands
Veal, Colin ; Department of Genetics, Genomics and Cancer Sciences, University of Leicester, University Road, Leicester, Leicester, LE1 7RH, UK
van der Vries, Gerben ; Department of Genetics, University of Groningen, University Medical Center Groningen, HPC CB50, P.O. Box 30001, Groningen, 9700 RB, The Netherlands
Wadsley, Marc ; Department of Genetics, Genomics and Cancer Sciences, University of Leicester, University Road, Leicester, Leicester, LE1 7RH, UK
Warren, Gregory; Department of Genetics, Genomics and Cancer Sciences, University of Leicester, University Road, Leicester, Leicester, LE1 7RH, UK
Zurek, Birte ; Institute of Medical Genetics and Applied Genomics, University of Tübingen, Calwerstraße 7, Tübingen 72076, Germany
Keane, Thomas ; European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, CV10 1SD, UK
Graessner, Holm ; Institute of Medical Genetics and Applied Genomics, University of Tübingen, Calwerstraße 7, Tübingen 72076, Germany ; Centre for Rare Diseases, University of Tübingen, Geschäftsstelle Eisenbahnstraße 63, Tübingen 72072, Germany
Beltran, Sergi ; Centro Nacional de Análisis Genómico, C/Baldiri Reixac 4, 08028, Barcelona, Spain ; Departament de Genètica, Microbiologia i Estadística, Facultat de Biologia, Universitat de Barcelona (UB), Diagonal, 643, 08028, Barcelona, Spain
Swertz, Morris A ; Department of Genetics, University of Groningen, University Medical Center Groningen, HPC CB50, P.O. Box 30001, Groningen, 9700 RB, The Netherlands
Brookes, Anthony J ; Department of Genetics, Genomics and Cancer Sciences, University of Leicester, University Road, Leicester, Leicester, LE1 7RH, UK
MAY, Patrick ; University of Luxembourg > Luxembourg Centre for Systems Biomedicine (LCSB) > Bioinformatics Core
Horizon 2020 Framework Programme Instituto de Salud Carlos III CINECA
Funding text :
We acknowledge all Solve-RD partners (see Solve-RD consortium) and all hospitals and patients that shared data. We acknowledge Olaf Riess as Solve-RD project coordinator. We acknowledge RedIris (https://www.rediris.es/rediris/) for enabling data transfer from the data providers to GPAP. The Solve-RD project has received funding from the European Union's Horizon2020 research and innovation program under grant agreement 779257. The RD-Connect Genome-Phenome Analysis Platform received funding from EU projects RD-Connect, Solve-RD, and EJP-RD (grants FP7 305444, H2020 779257, H2020 825575), Instituto de Salud Carlos III (grants PT13/0001/0044, PT17/0009/0019; Instituto Nacional de Bioinform\u00E1tica, INB), and ELIXIR Implementation Studies. The UMCG VRE and RD3 received funding from the EU projects Solve-RD, EJP-RD, and CINECA Project (H2020 779257, H2020 825575, H2020 825775, respectively) and NWO VIDI grant number 917.164.455.The Solve-RD project has received funding from the European Union\u2019s Horizon 2020 research and innovation program under grant agreement 779257. The RD\u2010Connect Genome\u2010Phenome Analysis Platform received funding from EU projects RD\u2010Connect, Solve-RD, and EJP-RD (grants FP7 305444, H2020 779257, H2020 825575), Instituto de Salud Carlos III (grants PT13/0001/0044, PT17/0009/0019; Instituto Nacional de Bioinform\u00E1tica, INB), and ELIXIR Implementation Studies. The UMCG VRE and RD3 received funding from the EU projects Solve-RD, EJP-RD, and CINECA Project (H2020 779257, H2020 825575, H2020 825775, respectively) and NWO VIDI grant number 917.164.455.
Zurek B, Ellwanger K, Vissers LELM, et al. Solve-RD: systematic pan-European data sharing and collaborative analysis to solve rare diseases. Eur J Hum Genet. 2021;29:1325–31. https://doi.org/10.1038/s41431-021-00859-0.
Laurie S, Piscia D, Matalonga L, et al. The RD-Connect Genome-Phenome Analysis Platform: accelerating diagnosis, research, and gene discovery for rare diseases. Hum Mutat. 2022;43(6):717–33. https://doi.org/10.1002/humu.24353.
Swertz MA, Dijkstra M, Adamusiak T, et al. The MOLGENIS toolkit: rapid prototyping of biosoftware at the push of a button. BMC Bioinf. 2010;11(Suppl. 12):S12. https://doi.org/10.1186/ 1471-2105-11-S12-S12.
van der Velde KJ, Imhann F, Charbon B, et al. MOLGENIS research: advanced bioinformatics data software for non-bioinformaticians Bioinformatics. 2019;35(6):1076–78. https://doi.org/10.1093/bioinformatics/bty742.
Lancaster O, Beck T, Atlan D, et al. Cafe Variome: general-purpose software for making genotype–phenotype data discoverable in restricted or open access contexts. Hum Mutat. 2015;36(10):957–64. https://doi.org/10.1002/humu.22841.
Boycott KM, Azzariti DR, Hamosh A, et al. Seven years since the launch of the Matchmaker Exchange: the evolution of genomic matchmaking. Hum Mutat. 2022;43(6):659–67. https://doi.org/10.1002/humu.24373.
Rambla J, Baudis M, Ariosa R, et al. Beacon v2 and Beacon networks: a “lingua franca” for federated data discovery in biomedical genomics, and beyond. Hum Mutat. 2022;43(6):791–99. https://doi.org/10.1002/humu.24369.
Wilkinson M, Dumontier M, Aalbersberg IJ, et al. The FAIR Guiding Principles for scientific data management and stewardship. Sci Data. 2016;3;160018. https://doi.org/10.1038/sdata.2016.18.
Laurie S, Fernandez-Callejo F, Marco-Sola S, et al. From wet-lab to variations: concordance and speed of bioinformatics pipelines for whole genome and whole exome sequencing. Hum Mutat. 2016;37(12):1263–71. https://doi.org10.1002/humu. 23114.
Kavianpour S, Sutherland J, Mansouri-Benssassi E, et al. Next-generation capabilities in trusted research environments: interview study. J Med Internet Res. 2022;24(9):e33720. https://doi.org/10.2196/33720.
Fiume M, Cupak M, Keenan S, et al. Federated discovery and sharing of genomic data using Beacons. Nat Biotechnol. 2019;37(3):220–24. https://doi.org/10.1038/s41587-019-0046-x.
van der Velde KJ, Singh G, Kaliyaperumal R, et al. FAIR genomes metadata schema promoting next generation sequencing data reuse in Dutch healthcare and research. Sci Data. 2022;9(1):1–13. https://doi.org/10.1038/s41597-022-01265-x.
European Joint Programme for Rare Disease project website. ht tps://www.ejprarediseases.org. Accessed 15 Jul 2024.
European Genomics Data Infrastructure project website. https://gdi.onemilliongenomes.eu. Accessed 15 Jul 2024.
Cock PJ, Fields CJ, Goto N, et al. The Sanger FASTQ file format for sequences with quality scores, and the Solexa/Illumina FASTQ variants. Nucleic Acids Res. 2010;38(6):1767–71. https://doi.org/10.1093/nar/gkp1137.
Danecek P, Bonfield JK, Liddle J, et al. Twelve years of SAMtools and BCFtools. Gigascience. 2021;10(2):giab008. https://doi.org/10.1093/gigascience/giab008.
Fritz MH-Y, Leinonen R, Cochrane G, et al. Efficient storage of high throughput DNA sequencing data using reference-based compression. Genome Res. 2011;21(5):734–40. https://doi.org/10.1101/gr.114819.110.
Spanish academic and research network: RedIris. https://www.rediris.es/. Accessed 15 Jul 2024.
Köhler S, Gargano M, Matentzoglu N, et al. The human phenotype ontology in 2021. Nucleic Acids Res. 2021;49(D1):D1207–17. https://doi.org10.1093/nar/gkaa1043.
What is the Orphanet Rare Disease Ontology (ORDO)?. December 2022. https://www.orphadata.com/docs/WhatIsORDO. pdf. Version 4.2. Accessed 13 August 2024.
Amberger JS, Bocchini CA, Schiettecatte F, et al. OMIM.Org: online Mendelian Inheritance in Man (OMIM®), an online catalog of human genes and genetic disorders. Nucleic Acids Res. 2015;43(D1):D789–98. https://doi.org/10.1093/nar/gku1205.
Jacobsen JOB, Baudis M, Baynam GS, et al. The GA4GH phenopacket schema defines a computable representation of clinical data. Nat Biotechnol. 2022;40:817–20. https://doi.org/10 .1038/s41587-022-01357-4.
Chang CC, Chow CC, Tellier LC, et al. Second-generation PLINK: rising to the challenge of larger and richer datasets. Gigascience. 2015;4:7. https://doi.org/10.1186/s13742-015-0047-8.
Caetano-Anolles D. PED—Pedigree format (Version September 30). https://gatk.broadinstitute.org/hc/en-us/articles/36003553 1972-PED-Pedigree-format. Accessed 15 Jul 2024.
Caetano-Anolles D. GVCF—Genomic Variant Call Format (Version March 09, 2023). https://gatk.broadinstitute.org/hc/en-us/articles/360035531812-GVCF-Genomic-Variant-Call-Format. Accessed 15 Jul 2024.
McLaren W, Gil L, Hunt SE, et al. The Ensembl variant effect predictor. Genome Biol. 2016;17:122. https://doi.org/10.1186/s13059 -016-0974-4.
Landrum MJ, Lee JM, Benson M, et al. ClinVar: improving access to variant interpretations and supporting evidence. Nucleic Acids Res. 2018;46(D1):D1062–67. https://doi.org/10.1093/nar/gk x1153.
Karczewski KJ, Francioli LC, Tiao G, et al. The mutational constraint spectrum quantified from variation in 141,456 humans. Nature. 2020;581;434–43. https://doi.org/10.1038/s41586-020-2 308-7.
Martin AR, Williams E, Foulger RE, et al. PanelApp crowdsources expert knowledge to establish consensus diagnostic gene panels. Nat Genet. 2019;51:1560–65. https://doi.org/10.1038/s41588 -019-0528-2.
European Genome-Phenome Archive. https://ega-archive.org/. Accessed 15 July 2024.
Lappalainen I, Almeida-King J, Kumanduri V, et al. The European Genome-Phenome Archive of human data consented for biomedical research. Nat Genet. 2015;47:692–95. https://doi.org/10.1038/ng.3312.
Freeberg MA, Fromont LA, D’Altri T, et al. The European Genome-Phenome Archive in 2021. Nucleic Acids Res. 2022;50(D1):D980–87. https://doi.org/10.1093/nar/gkab1059.
pyEGA3 GitHub repository. https://github.com/EGA-archive/ega -download-client. Accessed 15 July 2024.
EGA fuse client GitHub repository. https://github.com/EGA-archive/ega-fuse-client. Accessed 15 July 2024.
Corvò A, Matalonga L, Spalding D, et al. Remote visualization of large-scale genomic alignments for collaborative clinical research and diagnosis of rare diseases. Cell Genom. 2023;3(2):100246. https://doi.org/10.1016/j.xgen.2022.10 0246.
Matalonga L, Hernández-Ferrer C, Piscia D, et al. Solving patients with rare diseases through programmatic reanalysis of genomephenome data. Eur J Hum Genet. 2021;29(9):1337–47. https://doi.org/10.1038/s41431-021-00852-7.
Matchmaker Exchange API GitHub repository. https://github.com/ga4gh/mme-apis. Accessed 15 Jul 2024.
HPC cluster playbook GitHub repository. https://github.com/rug-cit-hpc/league-of-robots. Accessed 15 July 2024.
The CentOS Project. https://www.centos.org. Accessed 15 Jul 2024.
Cook CE, Bergman MT, Finn RD, et al. The European Bioinformatics Institute in 2016: data growth and integration. Nucleic Acids Res. 2016;44(D1):D20–D26. https://doi.org/10.1093/nar/gk v1352.
University Medical Center Groningen. Center for Information Technology. https://www.rug.nl/society-business/centre-for-information-technology/. Accessed 15 Jul 2024.
Degen W, Scholtens S. Research Support in Nederland. De stand van zaken bij RUG en UMCG. SURF. 2019. https://www.surf.nl/files/2019-03/2018_rapport_researchsupport-in-nl_rug-umcg.pdf. Accessed 22 Mar 2023.
The Gearshift High Performance Compute Cluster. http://docs.gcc.rug.nl/gearshift/. Accessed 15 Jul 2024.
Salomaa A. Public-key cryptography. 2nd ed. Berlin: Springer-Verlag, 1996.
WinSCP SFTP and FTP client. https://winscp.net. Accessed 15 Jul 2024.
MobaXterm terminal for Windows. https://mobaxterm.mobatek.net. Accessed 15 Jul 2024.
Cyberduck libre server and cloud storage browser. https://cyberduck.io. Accessed 15 Jul 2024.
Rare Disease Data about Data GitHub repository. https://github.com/molgenis/RD3_database. Accessed 15 Jul 2024.
Solve-RD Rare Disease Data about Data database. https://solverd.gcc.rug.nl/. Accessed 15 Jul 2024.
Linden M, Prochazka M, Lappalainen I, et al. Common ELIXIR service for researcher authentication and authorisation. F1000Res. 2018;7:ELIXIR–1199. https://doi.org/10.12688/f1000research.15161.1.
Fabregate A, Jupe S, Matthews L, et al. The Reactome pathway Knowledgebase. Nucleic Acids Res. 2018;46(D1):D649–55. https://doi.org/10.1093/nar/gkx1132.
The Genome Phenome Analysis Platform. https://platform.rdconnect.eu/. Accessed 15 Jul 2024.