Biomedical Research; Ecosystem; Information Dissemination; Statistics and Probability; Information Systems; Education; Computer Science Applications; Statistics, Probability and Uncertainty; Library and Information Sciences
Abstract :
[en] The discoverability of datasets resulting from the diverse range of translational and biomedical projects remains sporadic. It is especially difficult for datasets emerging from pre-competitive projects, often due to the legal constraints of data-sharing agreements, and the different priorities of the private and public sectors. The Translational Data Catalog is a single discovery point for the projects and datasets produced by a number of major research programmes funded by the European Commission. Funded by and rooted in a number of these European private-public partnership projects, the Data Catalog is built on FAIR-enabling community standards, and its mission is to ensure that datasets are findable and accessible by machines. Here we present its creation, content, value and adoption, as well as the next steps for sustainability within the ELIXIR ecosystem.
Disciplines :
Human health sciences: Multidisciplinary, general & others
Author, co-author :
WELTER, Danielle ; University of Luxembourg > Luxembourg Centre for Systems Biomedicine > Bioinformatics Core > Translational Informatics ; Luxembourg National Data Service (PNED G.I.E), 6 avenue des Hauts-Fourneaux, L-4362, Esch-sur-Alzette, Luxembourg
Rocca-Serra, Philippe ; Oxford e-Research Centre, Department of Engineering Science, University of Oxford, 7 Keble Road, OX13QG, Oxford, UK ; AstraZeneca, Data Office, Data Science & AI unit R&D, 136 Hills Rd, Cambridge, UK
GROUES, Valentin ; University of Luxembourg > Luxembourg Centre for Systems Biomedicine (LCSB) > Bioinformatics Core
SALLAM, Nirmeen ; University of Luxembourg > Luxembourg Centre for Systems Biomedicine (LCSB) > Bioinformatics Core
Ancien, François; Luxembourg Centre for Systems Biomedicine, ELIXIR Luxembourg, University of Luxembourg, L-4367, Belval, Luxembourg
SHABANI, Abetare ; University of Luxembourg > Luxembourg Centre for Systems Biomedicine (LCSB) > Bioinformatics Core
Asariardakani, Saeideh; Luxembourg Centre for Systems Biomedicine, ELIXIR Luxembourg, University of Luxembourg, L-4367, Belval, Luxembourg
ALPER, Pinar ; University of Luxembourg ; Luxembourg National Data Service (PNED G.I.E), 6 avenue des Hauts-Fourneaux, L-4362, Esch-sur-Alzette, Luxembourg
GHOSH, Soumyabrata ; University of Luxembourg > Luxembourg Centre for Systems Biomedicine (LCSB) > Bioinformatics Core
Burdett, Tony ; European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Hinxton, CB10 1SD, UK
Sansone, Susanna-Assunta ; Oxford e-Research Centre, Department of Engineering Science, University of Oxford, 7 Keble Road, OX13QG, Oxford, UK
GU, Wei ; University of Luxembourg ; Luxembourg National Data Service (PNED G.I.E), 6 avenue des Hauts-Fourneaux, L-4362, Esch-sur-Alzette, Luxembourg. wei.gu@uni.lu
SATAGOPAM, Venkata ; University of Luxembourg > Luxembourg Centre for Systems Biomedicine (LCSB) > Bioinformatics Core ; Frankfurt Institute for Advanced Studies (FIAS), Ruth-Moufang-Straße 1, D-60438, Frankfurt am Main, Germany. venkata.satagopam@uni.lu
The authors would like to thank the members of the FAIRplus consortium for their input into model discussions and review of the Data Catalog. We would in particular like to thank Fuqi Xu (0000-0002-5923-3859), Robert T. Giessmann (0000-0002-0254-1500) and Yojana Gadiya (0000-0002-7683-0452) for their contributions of study and dataset metadata, and Kavita Rege, Jacek Lebioda and Mohammed Shoaib (0000-0002-4854-4635) for their contributions to the code base. This work and the authors were funded by FAIRplus (IMI 802750).The authors would like to thank the members of the FAIRplus consortium for their input into model discussions and review of the Data Catalog. We would in particular like to thank Fuqi Xu (0000-0002-5923-3859), Robert T. Giessmann (0000-0002-0254-1500) and Yojana Gadiya (0000-0002-7683-0452) for their contributions of study and dataset metadata, and Kavita Rege, Jacek Lebioda and Mohammed Shoaib (0000-0002-4854-4635) for their contributions to the code base. This work and the authors were funded by FAIRplus (IMI 802750).
Wilkinson, M. D. et al. The FAIR Guiding Principles for scientific data management and stewardship. Sci. Data 3, 160018 (2016). DOI: 10.1038/sdata.2016.18
European Commission. Directorate General for Research and Innovation. & PwC EU Services. Cost-benefit analysis for FAIR research data: cost of not having FAIR research data. (Publications Office, 2018).
Sansone, S.-A. et al. DATS, the data tag suite to enable discoverability of datasets. Sci. Data 4, 170059 (2017). DOI: 10.1038/sdata.2017.59
Ohno-Machado, L. et al. Finding useful data across multiple biomedical data repositories using DataMed. Nat. Genet. 49, 816–819 (2017). DOI: 10.1038/ng.3864
Ohno-Machado, L. et al. bioCADDIE white paper - Data Discovery Index. Figshare 10.6084/m9.figshare.1362572.v1 (2015).
Albertoni, R. et al. Data Catalog Vocabulary (DCAT) - Version 2. Data Catalog Vocabulary (DCAT) - Version 2 w3.org/TR/vocab-dcat-2/ (2022).
Jackson, R. et al. OBO Foundry in 2021: operationalizing open data principles to evaluate ontologies. Database 2021 (2021).
Johnson, D., Gonzalez-Beltran, A. & Rocca-Serra, P. ISA-tools/isa-specs: ISA Model and Serialization Specifications 1.0. Zenodo 10.5281/zenodo.291872 (2017).
Welter, D. et al. FAIR in action - a flexible framework to guide FAIRification. Sci. Data 10, 291, 10.1038/s41597-023-02167-2 (2023). DOI: 10.1038/s41597-023-02167-2
Barrett, T. et al. NCBI GEO: archive for functional genomics data sets—update. Nucleic Acids Res. 41, D991–D995 (2013). DOI: 10.1093/nar/gks1193
Gray, A. J. G., Goble, C. & Jimenez, R. Bioschemas: From Potato Salad to Protein Annotation. Int. Semantic Web Conf. Posters Demos Ind. Tracks (2017).
Rocca-Serra, P. et al. The FAIR Cookbook - the essential resource for and by FAIR doers. Sci. Data 10, 292, 10.1038/s41597-023-02166-3 (2023). DOI: 10.1038/s41597-023-02166-3
Becker, R. et al. DAISY: A Data Information System for accountability under the General Data Protection Regulation. GigaScience 8 (2019).
Lawson, J. et al. The Data Use Ontology to streamline responsible access to human biomedical datasets. Cell Genomics 1, 100028 (2021). DOI: 10.1016/j.xgen.2021.100028