Metabolomics/methods; Gene Library; Cluster Analysis; Tandem Mass Spectrometry/methods; Access to Information; Metabolomics; Tandem Mass Spectrometry; Chemistry (all); Biochemistry, Genetics and Molecular Biology (all); Physics and Astronomy (all)
Abstract :
[en] Despite the increasing availability of tandem mass spectrometry (MS/MS) community spectral libraries for untargeted metabolomics over the past decade, the majority of acquired MS/MS spectra remain uninterpreted. To further aid in interpreting unannotated spectra, we created a nearest neighbor suspect spectral library, consisting of 87,916 annotated MS/MS spectra derived from hundreds of millions of MS/MS spectra originating from published untargeted metabolomics experiments. Entries in this library, or "suspects," were derived from unannotated spectra that could be linked in a molecular network to an annotated spectrum. Annotations were propagated to unknowns based on structural relationships to reference molecules using MS/MS-based spectrum alignment. We demonstrate the broad relevance of the nearest neighbor suspect spectral library through representative examples of propagation-based annotation of acylcarnitines, bacterial and plant natural products, and drug metabolism. Our results also highlight how the library can help to better understand an Alzheimer's brain phenotype. The nearest neighbor suspect spectral library is openly available for download or for data analysis through the GNPS platform to help investigators hypothesize candidate structures for unknown MS/MS spectra in untargeted metabolomics data.
Disciplines :
Life sciences: Multidisciplinary, general & others
Author, co-author :
Bittremieux, Wout ; Department of Computer Science, University of Antwerp, 2020, Antwerpen, Belgium. wout.bittremieux@uantwerpen.be
Avalon, Nicole E ; Scripps Institution of Oceanography, University of California San Diego, La Jolla, CA, 92093, USA
Thomas, Sydney P; Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California San Diego, La Jolla, CA, 92093, USA ; Collaborative Mass Spectrometry Innovation Center, University of California San Diego, La Jolla, CA, 92093, USA
Kakhkhorov, Sarvar A ; Laboratory of Physical and Chemical Methods of Research, Center for Advanced Technologies, Tashkent, 100174, Uzbekistan ; Department of Food Science, Faculty of Science, University of Copenhagen, Rolighedsvej 26, 1958, Frederiksberg C, Denmark
Aksenov, Alexander A; Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California San Diego, La Jolla, CA, 92093, USA ; Collaborative Mass Spectrometry Innovation Center, University of California San Diego, La Jolla, CA, 92093, USA ; Department of Chemistry, University of Connecticut, Storrs, CT, 06269, USA ; Arome Science inc., Farmington, CT, 06032, USA
Gomes, Paulo Wender P ; Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California San Diego, La Jolla, CA, 92093, USA ; Collaborative Mass Spectrometry Innovation Center, University of California San Diego, La Jolla, CA, 92093, USA
Aceves, Christine M; Department of Immunology and Microbiology, The Scripps Research Institute, La Jolla, CA, 92037, USA
Caraballo-Rodríguez, Andrés Mauricio; Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California San Diego, La Jolla, CA, 92093, USA ; Collaborative Mass Spectrometry Innovation Center, University of California San Diego, La Jolla, CA, 92093, USA
Gauglitz, Julia M; Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California San Diego, La Jolla, CA, 92093, USA ; Collaborative Mass Spectrometry Innovation Center, University of California San Diego, La Jolla, CA, 92093, USA
Gerwick, William H ; Scripps Institution of Oceanography, University of California San Diego, La Jolla, CA, 92093, USA ; Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California San Diego, La Jolla, CA, 92093, USA
Huan, Tao ; Department of Chemistry, University of British Columbia, Vancouver, BC, V6T 1Z1, Canada
Jarmusch, Alan K ; Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California San Diego, La Jolla, CA, 92093, USA ; Collaborative Mass Spectrometry Innovation Center, University of California San Diego, La Jolla, CA, 92093, USA ; Immunity, Inflammation, and Disease Laboratory, Division of Intramural Research, National Institute of Environmental Health Sciences, National Institutes of Health, Research Triangle Park, Durham, NC, 27709, USA
Kaddurah-Daouk, Rima F; Department of Psychiatry and Behavioral Sciences, Duke University School of Medicine, Durham, NC, 27701, USA ; Department of Medicine, Duke University, Durham, NC, 27710, USA ; Duke Institute of Brain Sciences, Duke University, Durham, NC, 27710, USA
Kang, Kyo Bin ; College of Pharmacy and Research Institute of Pharmaceutical Sciences, Sookmyung Women's University, Seoul, 04310, Korea
Kim, Hyun Woo ; College of Pharmacy and Integrated Research Institute for Drug Development, Dongguk University, Goyang, 10326, Korea
KONDIC, Todor ; University of Luxembourg > Luxembourg Centre for Systems Biomedicine > Environmental Cheminformatics > Team Emma SCHYMANSKI
Mannochio-Russo, Helena ; Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California San Diego, La Jolla, CA, 92093, USA ; Collaborative Mass Spectrometry Innovation Center, University of California San Diego, La Jolla, CA, 92093, USA ; Department of Biochemistry and Organic Chemistry, Institute of Chemistry, São Paulo State University, Araraquara, 14800-901, Brazil
Meehan, Michael J; Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California San Diego, La Jolla, CA, 92093, USA ; Collaborative Mass Spectrometry Innovation Center, University of California San Diego, La Jolla, CA, 92093, USA
Melnik, Alexey V; Department of Chemistry, University of Connecticut, Storrs, CT, 06269, USA ; Arome Science inc., Farmington, CT, 06032, USA
Nothias, Louis-Felix; Université Côte d'Azur, CNRS, ICN, Nice, France ; Interdisciplinary Institute for Artificial Intelligence (3iA) Côte d'Azur, Nice, France
O'Donovan, Claire; European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
Panitchpakdi, Morgan; Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California San Diego, La Jolla, CA, 92093, USA ; Collaborative Mass Spectrometry Innovation Center, University of California San Diego, La Jolla, CA, 92093, USA
Petras, Daniel ; Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California San Diego, La Jolla, CA, 92093, USA ; Collaborative Mass Spectrometry Innovation Center, University of California San Diego, La Jolla, CA, 92093, USA ; Interfaculty Institute of Microbiology and Infection Medicine, University of Tuebingen, 72076, Tuebingen, Germany ; Department of Biochemistry, University of California Riverside, Riverside, CA, 92507, USA
Schmid, Robin ; Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California San Diego, La Jolla, CA, 92093, USA ; Collaborative Mass Spectrometry Innovation Center, University of California San Diego, La Jolla, CA, 92093, USA
van der Hooft, Justin J J ; Collaborative Mass Spectrometry Innovation Center, University of California San Diego, La Jolla, CA, 92093, USA ; Bioinformatics Group, Wageningen University & Research, 6708 PB, Wageningen, The Netherlands
Weldon, Kelly C; Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California San Diego, La Jolla, CA, 92093, USA ; Collaborative Mass Spectrometry Innovation Center, University of California San Diego, La Jolla, CA, 92093, USA
Yang, Heejung ; Laboratory of Natural Products Chemistry, College of Pharmacy, Kangwon National University, Chuncheon, 24341, Korea
Xing, Shipei; Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California San Diego, La Jolla, CA, 92093, USA ; Collaborative Mass Spectrometry Innovation Center, University of California San Diego, La Jolla, CA, 92093, USA ; Department of Chemistry, University of British Columbia, Vancouver, BC, V6T 1Z1, Canada
Zemlin, Jasmine; Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California San Diego, La Jolla, CA, 92093, USA ; Collaborative Mass Spectrometry Innovation Center, University of California San Diego, La Jolla, CA, 92093, USA
Wang, Mingxun; Department of Computer Science and Engineering, University of California Riverside, Riverside, CA, 92507, USA
Dorrestein, Pieter C ; Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California San Diego, La Jolla, CA, 92093, USA. pdorrestein@health.ucsd.edu ; Collaborative Mass Spectrometry Innovation Center, University of California San Diego, La Jolla, CA, 92093, USA. pdorrestein@health.ucsd.edu
This research was supported in part by BBSRC-NSF award 2152526. This research was supported in part by National Institutes of Health awards R01 GM107550, U19 AG063744, U01AG061359, R03 CA211211, P41 GM103484, T32 HD123456. This research was supported in part by the National Institute of Aging’s Accelerating Medicines Partnership for AD (AMP-AD) and was supported by NIH grants 1R01AG069901-01A1, U01AG061357, as well as by the Alzheimer Gut Microbiome Project grant 1U19AG063744. This research was supported in part by federal award DE-SC0021340 subaward 1070261-436503. This research was supported in part by the Gordon and Betty Moore Foundation through grant GBMF7622. This research was supported in part by the Intramural Research Program of National Institute of Environmental Health Sciences of the National Institutes of Health (ZIC ES103363). WB acknowledges support by the University of Antwerp Research Fund. This research was supported in part by the National Center for Complementary and Integrative Health of the NIH under award number F32AT011475 to N.E.A. E.L.S. and T.K. acknowledge funding support from the Luxembourg National Research Fund (FNR) for project A18/BM/12341006. M.W. was partially supported by the US Department of Energy Joint Genome Institute operated under Contract No. DE-AC02-05CH11231. D.P. was supported by the Deutsche Forschungsgemeinschaft (DFG) through the CMFI Cluster of Excellence (EXC 2124). S.A.K. was supported by the Fund for Financing Science and Supporting Innovation under the Ministry of Innovative Development of the Republic of Uzbekistan. K.B.K. was supported by the National Research Foundation of Korea (NRF) grant funded by the Ministry of Science and ICT (NRF-2020R1C1C1004046). H.W.K. was supported by the National Research Foundation of Korea (NRF) grant funded by the Korean Government (MSIT) (2018R1A5A2023127). H.M.-R. acknowledges the Brazilian National Council for Scientific and Technological Development (CNPq, #142014/2018-4) and the Brazilian Fulbright Commission for the scholarships provided. L.-F.N. has been supported by the French government, through the UCA Investments in the Future project managed by the National Research Agency (ANR) with the reference number ANR-15-IDEX-01. J.J.J.vd.H. was supported by an ASDI eScience grant from the Netherlands eScience Center (ASDI.2017.030). C.O.D. was supported by EMBL core funds. The Alzheimer’s disease metabolomics data was funded wholly or in part by the Alzheimer’s Gut Microbiome Project (AGMP) NIH grant U19AG063744 awarded to R.F.K.-D. at Duke University in partnership with a large number of academic institutions. More information about the project and the institutions involved can be found at https://alzheimergut.org/meet-the-team/ . J.E.D.I.
Bittremieux, W., Wang, M. & Dorrestein, P. C. The critical role that spectral libraries play in capturing the metabolomics community knowledge. Metabolomics 18, 94 (2022).
Sindelar, M. & Patti, G. J. Chemical discovery in the era of metabolomics. J. Am. Chem. Soc. 142, 9097–9105 (2020).
Schmid, R. et al. Ion identity molecular networking for mass spectrometry-based metabolomics in the GNPS environment. Nat. Commun. 12, 3832 (2021).
Chen, L. et al. Metabolite discovery through global annotation of untargeted metabolomics data. Nat. Methods 18, 1377–1385 (2021).
Djoumbou-Feunang, Y. et al. BioTransformer: a comprehensive computational tool for small molecule metabolism prediction and metabolite identification. J. Cheminform. 11, 2 (2019).
Aron, A. T. et al. Reproducible molecular networking of untargeted mass spectrometry data using GNPS. Nat. Protoc. 15, 1954–1991 (2020).
Burke, M. C. et al. The hybrid search: a mass spectral library search method for discovery of modifications in proteomics. J. Proteome Res. 16, 1924–1935 (2017).
Huber, F. et al. Spec2Vec: Improved mass spectral similarity scoring through learning of structural relationships. PLOS Comput. Biol. 17, e1008724 (2021).
Aisporna, A. et al. Neutral loss mass spectral data enhances molecular similarity analysis in METLIN. J. Am. Soc. Mass Spectrom. 33, 530–534 (2022).
Bittremieux, W. et al. Comparison of cosine, modified cosine, and neutral loss based spectral alignment for discovery of structurally related molecules. J. Am. Soc. Mass Spectrom. 33, 1733–1744 (2022).
Treen, D. G. C. et al. SIMILE enables alignment of tandem mass spectra with statistical significance. Nat. Commun. 13, 2510 (2022).
Wang, M. et al. Sharing and community curation of mass spectrometry data with Global Natural Products Social Molecular Networking. Nat. Biotechnol. 34, 828–837 (2016).
Fox Ramos, A. E., Evanno, L., Poupon, E., Champy, P. & Beniddir, M. A. Natural products targeting strategies involving molecular networking: different manners, one goal. Nat. Prod. Rep. 36, 960–980 (2019).
Remoroza, C. A., Mak, T. D., De Leoz, M. L. A., Mirokhin, Y. A. & Stein, S. E. Creating a mass spectral reference library for oligosaccharides in human milk. Anal. Chem. 90, 8977–8988 (2018).
Yan, X. et al. Mass spectral library of acylcarnitines derived from human urine. Anal. Chem. 92, 6521–6528 (2020).
Haug, K. et al. MetaboLights-an open-access general-purpose repository for metabolomics studies and associated meta-data. Nucleic Acids Res. 41, D781–D786 (2013).
Sud, M. et al. Metabolomics workbench: an international repository for metabolomics data and metadata, metabolite standards, protocols, tutorials and training, and analysis tools. Nucleic Acids Res 44, D463–D470 (2015).
Jarmusch, A. K. et al. ReDU: a framework to find and reanalyze public mass spectrometry data. Nat. Methods 17, 901–904 (2020).
Dührkop, K. et al. SIRIUS 4: a rapid tool for turning tandem mass spectra into metabolite structure information. Nat. Methods 16, 299–302 (2019).
Xing, S., Shen, S., Xu, B., Li, X. & Huan, T. BUDDY: molecular formula discovery via bottom-up MS/MS interrogation. Nat. Methods 20, 881–890 (2023).
Creasy, D. M. & Cottrell, J. S. Unimod: protein modifications for mass spectrometry. Proteomics 4, 1534–1536 (2004).
Sumner, L. W. et al. Proposed minimum reporting standards for chemical analysis: Chemical Analysis Working Group (CAWG) Metabolomics Standards Initiative (MSI). Metabolomics 3, 211–221 (2007).
McCann, M. R., George De la Rosa, M. V., Rosania, G. R. & Stringer, K. A. L-Carnitine and acylcarnitines: Mitochondrial biomarkers for precision medicine. Metabolites 11, 51 (2021).
Zuniga, A. & Li, L. Ultra-high performance liquid chromatography tandem mass spectrometry for comprehensive analysis of urinary acylcarnitines. Anal. Chim. Acta 689, 77–84 (2011).
Su, X., Han, X., Mancuso, D. J., Abendschein, D. R. & Gross, R. W. Accumulation of long-chain acylcarnitine and 3-hydroxy acylcarnitine molecular species in diabetic myocardium: Identification of alterations in mitochondrial fatty acid processing in diabetic myocardium by shotgun lipidomics. Biochemistry 44, 5234–5245 (2005).
Luesch, H., Yoshida, W. Y., Moore, R. E., Paul, V. J. & Corbett, T. H. Total structure determination of apratoxin A, a potent novel cytotoxin from the marine cyanobacterium Lyngbya m. ajuscula.J. Am. Chem. Soc. 123, 5418–5423 (2001).
Gutiérrez, M. et al. Apratoxin D, a potent cytotoxic cyclodepsipeptide from Papua New Guinea collections of the marine cyanobacteria Lyngbya majuscula and Lyngbya sordida. J. Nat. Prod. 71, 1099–1103 (2008).
Fischbach, M. A. & Clardy, J. One pathway, many products. Nat. Chem. Biol. 3, 353–355 (2007).
Thomas, S. et al. An untargeted metabolomics analysis of exogenous chemicals in human milk and transfer to the infant. Clin. Transl. Sci. 15, 2576–2582 (2022).
Kang, K. B. et al. Mass spectrometry data on specialized metabolome of medicinal plants used in East Asian traditional medicine. Sci. Data 9, 528 (2022).
Aksenov, A. A. et al. The molecular impact of life in an indoor environment. Sci. Adv. 8, eabn8016 (2022).
Bennett, D. A. et al. Religious orders study and rush memory and aging project. J. Alzheimers Dis. 64, S161–S189 (2018).
Fahy, E. et al. Update of the LIPID MAPS comprehensive classification system for lipids. J. Lipid Res. 50, S9–S14 (2009).
Horgusluoglu, E. et al. Integrative metabolomics‐genomics approach reveals key metabolic pathways and regulators of Alzheimer’s disease. Alzheimers Dement 18, 1260–1278 (2022).
Jia, L. et al. A metabolite panel that differentiates Alzheimer’s disease from other dementia types. Alzheimers Dement 18, 1345–1356 (2022).
Kong, A. T., Leprevost, F. V., Avtonomov, D. M., Mellacheruvu, D. & Nesvizhskii, A. I. MSFragger: ultrafast and comprehensive peptide identification in mass spectrometry–based proteomics. Nat. Methods 14, 513–520 (2017).
Bittremieux, W., Meysman, P., Noble, W. S. & Laukens, K. Fast open modification spectral library searching through approximate nearest neighbor indexing. J. Proteome Res. 17, 3463–3474 (2018).
Bittremieux, W., Laukens, K. & Noble, W. S. Extremely fast and accurate open modification spectral library searching of high-resolution mass spectra using feature hashing and graphics processing units. J. Proteome Res. 18, 3792–3799 (2019).
Frank, A. M. et al. Clustering millions of tandem mass spectra. J. Proteome Res. 7, 113–122 (2008).
Schymanski, E. & Neumann, S. The Critical Assessment of Small Molecule Identification (CASMI): challenges and solutions. Metabolites 3, 517–538 (2013).
Mohimani, H. et al. Dereplication of peptidic natural products through database search of mass spectra. Nat. Chem. Biol. 13, 30–37 (2016).
Phapale, P. et al. Public LC-Orbitrap tandem mass spectral library for metabolite identification. J. Proteome Res. 20, 2089–2097 (2021).
Huang, R. et al. The NCATS pharmaceutical collection: a 10-year update. Drug Discov. Today 24, 2341–2349 (2019).
Wishart, D. S. et al. HMDB 5.0: the human metabolome database for 2022. Nucleic Acids Res. 50, D622–D631 (2021).
Olivier-Jimenez, D. et al. A database of high-resolution MS/MS spectra for lichen metabolites. Sci. Data 6, 294 (2019).
Horai, H. et al. MassBank: a public repository for sharing mass spectral data for life sciences. J. Mass Spectrom. 45, 703–714 (2010).
Fox Ramos, A. E. et al. Collected mass spectrometry data on monoterpene indole alkaloids from natural product chemistry research. Sci. Data 6, 15 (2019).
Kyle, J. E. et al. LIQUID: an-open source software for identifying lipids in LC-MS/MS-based lipidomics data. Bioinformatics 33, 1744–1746 (2017).
Sawada, Y. et al. RIKEN tandem mass spectral database (ReSpect) for phytochemicals: a plant-specific MS/MS-based data resource and database. Phytochemistry 82, 38–45 (2012).
Bittremieux, W. et al. Universal MS/MS visualization and retrieval with the Metabolomics Spectrum Resolver web service. bioRxiv https://doi.org/10.1101/2020.05.09.086066 (2020).
Petras, D. et al. GNPS Dashboard: collaborative exploration of mass spectrometry data in the web browser. Nat. Methods 19, 134–136 (2022).
Deutsch, E. W. et al. Universal spectrum Identifier for mass spectra. Nat. Methods 18, 768–770 (2021).
Harris, C. R. et al. Array programming with NumPy. Nature 585, 357–362 (2020).
SciPy 1.0 Contributors. et al. SciPy 1.0: fundamental algorithms for scientific computing in Python. Nat. Methods 17, 261–272 (2020).
McKinney, W. Data structures for statistical computing in Python. In Proceedings of the 9th Python in Science Conference (eds. van der Walt, S. & Millman, J.) 51–56 https://doi.org/10.25080/Majora-92bf1922-00a (2010).
Seabold, S. Perktold, J. statsmodels: Econometric and statistical modeling with Python. In Proceedings of the 9th Python in Science Conference (SciPy 2010) 92096 https://doi.org/10.25080/Majora-92bf1922-011 (2010).
Levitsky, L. I., Klein, J. A., Ivanov, M. V. & Gorshkov, M. Pyteomics 4.0: Five years of development of a Python proteomics framework. J. Proteome Res. 18, 709–714 (2019).
Hunter, J. D. Matplotlib: A 2D graphics environment. Comput. Sci. Eng. 9, 90–95 (2007).
Waskom, M. L. seaborn: statistical data visualization. J. Open Source Softw. 6, 3021 (2021).
Bittremieux, W. spectrum_utils: a Python package for mass spectrometry data processing and visualization. Anal. Chem. 92, 659–661 (2020).
Bittremieux, W. et al. Unified and standardized mass spectrometry data processing in Python Using spectrum_utils. J. Proteome Res. 22, 625–631 (2023).
Thomas, K. et al. Jupyter Notebooks - A publishing format for reproducible computational workflows. In Positioning and Power in Academic Publishing: Players, Agents and Agendas 87–90 (IOS Press, 2016).
Shannon, P. et al. Cytoscape: a software environment for integrated models of biomolecular interaction networks. Genome Res. 13, 2498–2504 (2003).