data integration; data workflow; metagenomics; metaproteomics; metatranscriptomics
Abstract :
[en] The availability of public metaproteomics, metagenomics and metatranscriptomics data in public resources such as MGnify (for metagenomics/metatranscriptomics) and the PRIDE database (for metaproteomics), continues to increase. When these omics techniques are applied to the same samples, their integration offers new opportunities to understand the structure (metagenome) and functional expression (metatranscriptome and metaproteome) of the microbiome. Here, we describe a pilot study aimed at integrating public multi-meta-omics datasets from studies based on human gut and marine hatchery samples. Reference search databases (search DBs) were built using assembled metagenomic (and metatranscriptomic, where available) sequence data followed by de novo gene calling, using both data from the same sampling event and from independent samples. The resulting protein sets were evaluated for their utility in metaproteomics analysis. In agreement with previous studies, the highest number of peptide identifications was generally obtained when using search DBs created from the same samples. Data integration of the multi-omics results was performed in MGnify. For that purpose, the MGnify website was extended to enable the visualisation of the resulting peptide/protein information from three reanalysed metaproteomics datasets. A workflow (https://github.com/PRIDE-reanalysis/MetaPUF) has been developed allowing researchers to perform equivalent data integration, using paired multi-omics datasets. This is the first time that a data integration approach for multi-omics datasets has been implemented from public data available in the world-leading MGnify and PRIDE resources.
Research center :
Luxembourg Centre for Systems Biomedicine (LCSB): Bioinformatics Core (R. Schneider Group) Luxembourg Centre for Systems Biomedicine (LCSB): Eco-Systems Biology (Wilmes Group)
Disciplines :
Microbiology Environmental sciences & ecology
Author, co-author :
Wang, Shengbo; European Molecular Biology Laboratory - European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge, UK
Kaur, Satwant; European Molecular Biology Laboratory - European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge, UK
Kunath, Benoit J; Systems Ecology Group, Luxembourg Centre for Systems Biomedicine, University of Luxembourg, Esch-sur-Alzette, Luxembourg ; Department of Life Sciences and Medicine, Faculty of Science, Technology and Medicine, University of Luxembourg, Esch-sur-Alzette, Luxembourg
MAY, Patrick ; University of Luxembourg > Luxembourg Centre for Systems Biomedicine (LCSB) > Bioinformatics Core
Richardson, Lorna; European Molecular Biology Laboratory - European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge, UK
Rogers, Alexander B; European Molecular Biology Laboratory - European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge, UK
WILMES, Paul ; University of Luxembourg > Luxembourg Centre for Systems Biomedicine (LCSB) > Systems Ecology
Finn, Robert D; European Molecular Biology Laboratory - European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge, UK
Vizcaíno, Juan Antonio ; European Molecular Biology Laboratory - European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge, UK
External co-authors :
yes
Language :
English
Title :
An Approach to Integrate Metagenomics, Metatranscriptomics and Metaproteomics Data in Public Data Resources.
Publication date :
28 April 2025
Journal title :
Proteomics
ISSN :
1615-9853
eISSN :
1615-9861
Publisher :
Wiley, Weinheim, United States - Delaware
Pages :
e202500002
Peer reviewed :
Peer Reviewed verified by ORBi
Focus Area :
Systems Biomedicine
Development Goals :
3. Good health and well-being
FnR Project :
FNR13684739 - metaPUF - The Dark Metaproteome: Identifying Proteins Of Unknown Function In The Human Gut Microbiome, 2019 (01/04/2020-31/03/2022) - Paul Wilmes
Name of the research project :
R-AGR-3717 - C19/BM/13684739/MetaPUF - WILMES Paul
Funders :
FNR - Fonds National de la Recherche
Funding number :
FNR13684739
Funding text :
The authors would like to acknowledge funding from the National Research Fund Luxembourg (FNR) [grant number C19/BM/13684739], Wellcome [grant number 223745/Z/21/Z] and EMBL core funding. We would also like to thank the original researchers who made the datasets available in the public domain.
G. Sasson, S. Morais, F. Kokou, et al., “Metaproteome Plasticity Sheds Light on the Ecology of the rumen Microbiome and Its Connection to Host Traits,” ISME Journal 16 (2022): 2610–2621.
I. Vanwonterghem, P. D. Jensen, D. P. Ho, D. J. Batstone, and G. W. Tyson, “Linking Microbial Community Structure, Interactions and Function in Anaerobic Digesters Using New Molecular Techniques,” Current Opinion in Biotechnology 27 (2014): 55–64.
M. P. Mikan, H. R. Harvey, E. Timmins-Schiffman, et al., “Metaproteomics Reveal That Rapid Perturbations in Organic Matter Prioritize Functional Restructuring Over Taxonomy in Western Arctic Ocean Microbiomes,” ISME Journal 14 (2020): 39–52.
V. Jouffret, G. Miotello, K. Culotta, et al., “Increasing the Power of Interpretation for Soil Metaproteomics Data,” Microbiome 9 (2021): 195.
J. Gutleben, M. Chaib De Mares, J. D. van Elsas, et al., “The Multi-Omics Promise in Context: From Sequence to Microbial Isolate,” Critical Reviews in Microbiology 44 (2018): 212–229.
P. Wilmes, A. Heintz-Buschart, and P. L. Bond, “A Decade of Metaproteomics: Where We Stand and What the Future Holds,” Proteomics 15 (2015): 3409–3417.
R. L. Hettich, C. Pan, K. Chourey, and R. J. Giannone, “Metaproteomics: Harnessing the Power of High Performance Mass Spectrometry to Identify the Suite of Proteins That Control Metabolic Activities in Microbial Communities,” Analytical Chemistry 85 (2013): 4203–4214.
B. J. Kunath, G. Minniti, M. Skaugen, et al., “Metaproteomics: Sample Preparation and Methodological Considerations,” Advances in Experimental Medicine and Biology 1073 (2019): 187–215.
C. UniProt, “UniProt: The Universal Protein Knowledgebase in 2023,” Nucleic Acids Research 51 (2023): D523–D531.
T. Muth, B. Y. Renard, and L. Martens, “Metaproteomic Data Analysis at a Glance: Advances in Computational Microbial Community Proteomics,” Expert Review of Proteomics 13 (2016): 757–769.
A. Tanca, A. Palomba, M. Deligios, et al., “Evaluating the Impact of Different Sequence Databases on Metaproteome Analysis: Insights From a Lab-Assembled Microbial Mixture,” PLOS ONE 8 (2013): 82981.
T. Van Den Bossche, B. J. Kunath, K. Schallert, et al., “Critical Assessment of MetaProteome Investigation (CAMPI): A Multi-Laboratory Comparison of Established Workflows,” Nature Communications 12 (2021): 7305.
M. Kleiner, “Metaproteomics: Much More Than Measuring Gene Expression in Microbial Communities,” mSystems 4 (2019).
A. L. Mitchell, A. Almeida, M. Beracochea, et al., “MGnify: The Microbiome Analysis Resource in 2020,” Nucleic Acids Research 48 (2020): D570–d578.
Y. Perez-Riverol, J. Bai, C. Bandla, et al., “The PRIDE Database Resources in 2022: A Hub for Mass Spectrometry-Based Proteomics Evidences,” Nucleic Acids Research 50 (2022): D543–D552.
S. Wang, D. García-Seisdedos, A. Prakash, et al., “Integrated View and Comparative Analysis of Baseline Protein Expression in Mouse and Rat Tissues,” PLOS Computational Biology 18 (2022): 1010174.
M. Walzer, D. García-Seisdedos, A. Prakash, et al., “Implementing the Reuse of Public DIA Proteomics Datasets: From the PRIDE Database to Expression Atlas,” Scientific Data 9 (2022): 335.
A. F. Jarnuczak, H. Najgebauer, M. Barzine, et al., “An Integrated Landscape of Protein Expression in Human Cancer,” Scientific Data 8 (2021): 115.
K. A. Ramsbottom, A. Prakash, Y. Perez-Riverol, et al., “Meta-Analysis of Rice Phosphoproteomics Data to Understand Variation in Cell Signaling Across the Rice Pan-Genome,” Journal of Proteome Research 23 (2024): 2518–2531.
D. Ochoa, A. F. Jarnuczak, C. Viéitez, et al., “The Functional Landscape of the Human Phosphoproteome,” Nature Biotechnology 38 (2020): 365–373.
C. Cummins, A. Ahamed, R. Aslam, et al., “The European Nucleotide Archive in 2021,” Nucleic Acids Research 50 (2022): D106–D110.
C. Dai, A. Füllgrabe, J. Pfeuffer, et al., “A Proteomics Sample Metadata Representation for Multiomics Integration and Big Data Analysis,” Nature Communications 12 (2021): 5854.
S. Nurk, D. Meleshko, A. Korobeynikov, and P. A. Pevzner, “metaSPAdes: A New Versatile Metagenomic Assembler,” Genome Research 27 (2017): 824–834.
N. T. Pierce, L. Irber, T. Reiter, P. Brooks, and C. T. Brown, “Large-Scale Sequence Comparisons With sourmash,” Large- 8 (2019): 1006.
N. Hulstaert, J. Shofstahl, T. Sachsenberg, et al., “ThermoRawFileParser: Modular, Scalable, and Cross-Platform RAW File Conversion,” Journal of Proteome Research 19 (2020): 537–542.
H. Barsnes and M. Vaudel, “SearchGUI: A Highly Adaptable Common Interface for Proteomics Search and De Novo Engines,” Journal of Proteome Research 17 (2018): 2552–2555.
R. Craig and R. C. Beavis, “TANDEM: Matching Proteins With Tandem Mass Spectra,” Bioinformatics 20 (2004): 1466–1467.
S. Kim and P. A. Pevzner, “MS-GF+ Makes Progress Towards a Universal Database Search Tool for Proteomics,” Nature Communications 5 (2014): 5277.
M. Vaudel, J. M. Burkhart, R. P. Zahedi, et al., “PeptideShaker Enables Reanalysis of MS-Derived Proteomics Data Sets,” Nature Biotechnology 33 (2015): 22–24.
F. Mölder, K. P. Jablonski, B. Letcher, et al., “Sustainable Data Analysis With Snakemake,” F1000Research 10 (2021): 33.
H. Thorvaldsdottir, J. T. Robinson, and J. P. Mesirov, “Integrative Genomics Viewer (IGV): High-Performance Genomics Data Visualization and Exploration,” Briefings in Bioinformatics 14 (2013): 178–192.
T. Muth, C. A. Kolmeder, J. Salojärvi, et al., “Navigating Through Metaproteomics Data: A Logbook of Database Searching,” Proteomics 15 (2015): 3439–3453.
T. Dumas, R. Martinez Pinna, C. Lozano, et al., “The Astounding Exhaustiveness and Speed of the Astral Mass Analyzer for Highly Complex Samples Is a Quantum Leap in the Functional Analysis of Microbiomes,” Microbiome 12 (2024): 46.
P. Jagtap, J. Goslinga, J. A. Kooren, et al., “A Two-Step Database Search Method Improves Sensitivity in Peptide Sequence Matches for Metaproteomics and Proteogenomics Studies,” Proteomics 13 (2013): 1352–1357.
L. M. Buur, A. Declercq, M. Strobl, et al., “MS 2 Rescore 3.0 Is a Modular, Flexible, and User-Friendly Platform to Boost Peptide Identifications, as Showcased With MS Amanda 3.0,” Journal of Proteome Research 23 (2024): 3200–3207.
E. W. Deutsch, Y. Perez-Riverol, J. Carver, et al., “Universal Spectrum Identifier for Mass Spectra,” Nature Methods 18 (2021): 768–770.
S. Rosonovski, M. Levchenko, R. Bhatnagar, et al., “Europe PMC in 2023,” Nucleic Acids Research 52 (2024): D1668–D1676.
M. Courtot, D. Gupta, I. Liyanage, F. Xu, and T. Burdett, “BioSamples Database: FAIRer Samples Metadata to Accelerate Research Data Management,” Nucleic Acids Research 50 (2022): D1500–D1507.