collision cross section; exposomics; identification; ion mobility; nontarget screening; PubChem; PubChemLite; Chemical space; Collision cross sections; Cross-section values; Environmental data; Exposomic; Identification; Ion Mobility; Non-target screenings; Pubchem; Pubchemlite; Environmental Chemistry; Ecology; Water Science and Technology; Waste Management and Disposal; Pollution; Health, Toxicology and Mutagenesis
Abstract :
[en] Finding relevant chemicals in the vast (known) chemical space is a major challenge for environmental and exposomics studies leveraging nontarget high resolution mass spectrometry (NT-HRMS) methods. Chemical databases now contain hundreds of millions of chemicals, yet many are not relevant. This article details an extensive collaborative, open science effort to provide a dynamic collection of chemicals for environmental, metabolomics, and exposomics research, along with supporting information about their relevance to assist researchers in the interpretation of candidate hits. The PubChemLite for Exposomics collection is compiled from ten annotation categories within PubChem, enhanced with patent, literature and annotation counts, predicted partition coefficient (logP) values, as well as predicted collision cross section (CCS) values using CCSbase. Monthly versions are archived on Zenodo under a CC-BY license, supporting reproducible research, and a new interface has been developed, including historical trends of patent and literature data, for researchers to browse the collection. This article details how PubChemLite can support researchers in environmental and exposomics studies, describes efforts to increase the availability of experimental CCS values, and explores known limitations and potential for future developments. The data and code behind these efforts are openly available. PubChemLite can be browsed at https://pubchemlite.lcsb.uni.lu.
Disciplines :
Chemistry
Author, co-author :
ELAPAVALORE, Anjana ; University of Luxembourg > Luxembourg Centre for Systems Biomedicine (LCSB) > Environmental Cheminformatics
Ross, Dylan H. ; Department of Medicinal Chemistry, University of Washington, Seattle, United States ; Biological Sciences Division, Pacific Northwest National Laboratory, Richland, United States
GROUES, Valentin ; University of Luxembourg > Luxembourg Centre for Systems Biomedicine (LCSB) > Bioinformatics Core
AURICH, Dagny ; University of Luxembourg > Luxembourg Centre for Systems Biomedicine > Environmental Cheminformatics > Team Emma SCHYMANSKI
Krinsky, Allison M.; Department of Medicinal Chemistry, University of Washington, Seattle, United States
Kim, Sunghwan ; National Center for Biotechnology Information (NCBI), National Library of Medicine (NLM), National Institutes of Health (NIH), Bethesda, United States
Thiessen, Paul A. ; National Center for Biotechnology Information (NCBI), National Library of Medicine (NLM), National Institutes of Health (NIH), Bethesda, United States
Zhang, Jian; National Center for Biotechnology Information (NCBI), National Library of Medicine (NLM), National Institutes of Health (NIH), Bethesda, United States
Dodds, James N. ; Department of Chemistry, University of North Carolina, Chapel Hill, United States
Baker, Erin S. ; Department of Chemistry, University of North Carolina, Chapel Hill, United States
Bolton, Evan E. ; National Center for Biotechnology Information (NCBI), National Library of Medicine (NLM), National Institutes of Health (NIH), Bethesda, United States
Xu, Libin ; Department of Medicinal Chemistry, University of Washington, Seattle, United States
SCHYMANSKI, Emma ; University of Luxembourg > Luxembourg Centre for Systems Biomedicine (LCSB) > Environmental Cheminformatics
National Institute of Environmental Health Sciences HORIZON EUROPE European Innovation Council Fonds National de la Recherche Luxembourg U.S. National Library of Medicine National Institute of General Medical Sciences Universit? du Luxembourg
Funding text :
A.E., D.A., and E.L.S. acknowledge funding support from the Luxembourg National Research Fund (FNR) for project A18/BM/12341006 (A.E., D.A., E.L.S.), the University of Luxembourg Institute for Advanced Studies (IAS) for the Audacity project “LuxTIME” (D.A., E.L.S.) and the European Union Research and Innovation program Horizon Europe for PARC, Grant No. 101057014 (A.E.). The work of S.K., P.A.T., J.Z., and E.E.B. was supported by the National Center for Biotechnology Information of the National Library of Medicine (NLM), National Institutes of Health. J.N.D. and E.S.B. would like to acknowledge funding support from the National Institute of Environmental Health Sciences (P42 ES027704) and National Institute of General Medical Sciences (R01 GM141277 and RM1 GM145416). L.X. acknowledges financial support from the National Institute of Environmental Health Sciences, National Institutes of Health (R01 ES031927).
Hollender, J.; Schymanski, E. L.; Ahrens, L.; Alygizakis, N.; Béen, F.; Bijlsma, L.; Brunner, A. M.; Celma, A.; Fildier, A.; Fu, Q.; Gago-Ferrero, P.; Gil-Solsona, R.; Haglund, P.; Hansen, M.; Kaserzon, S.; Kruve, A.; Lamoree, M.; Margoum, C.; Meijer, J.; Merel, S. NORMAN Guidance on Suspect and Non-Target Screening in Environmental Monitoring. Environmental Sciences Europe 2023, 35 ( 1), 75, 10.1186/s12302-023-00779-4
Lai, Y.; Koelmel, J. P.; Walker, D. I.; Price, E. J.; Papazian, S.; Manz, K. E.; Castilla-Fernández, D.; Bowden, J. A.; Nikiforov, V.; David, A.; Bessonneau, V.; Amer, B.; Seethapathy, S.; Hu, X.; Lin, E. Z.; Jbebli, A.; McNeil, B. R.; Barupal, D.; Cerasa, M.; Xie, H. High-Resolution Mass Spectrometry for Human Exposomics: Expanding Chemical Space Coverage. Environ. Sci. Technol. 2024, 58 ( 29), 12784- 12822, 10.1021/acs.est.4c01156
Belova, L.; Caballero-Casero, N.; van Nuijs, A. L. N.; Covaci, A. Ion Mobility-High-Resolution Mass Spectrometry (IM-HRMS) for the Analysis of Contaminants of Emerging Concern (CECs): Database Compilation and Application to Urine Samples. Anal. Chem. 2021, 93 ( 16), 6428- 6436, 10.1021/acs.analchem.1c00142
Celma, A.; Bade, R.; Sancho, J. V.; Hernandez, F.; Humphries, M.; Bijlsma, L. Prediction of Retention Time and Collision Cross Section (CCSH+, CCSH-, and CCSNa+) of Emerging Contaminants Using Multiple Adaptive Regression Splines. J. Chem. Inf. Model. 2022, 62 ( 22), 5425- 5434, 10.1021/acs.jcim.2c00847
Song, X.-C.; Dreolin, N.; Canellas, E.; Goshawk, J.; Nerin, C. Prediction of Collision Cross-Section Values for Extractables and Leachables from Plastic Products. Environ. Sci. Technol. 2022, 56 ( 13), 9463- 9473, 10.1021/acs.est.2c02853
Ieritano, C.; Hopkins, W. S. Assessing Collision Cross Section Calculations Using MobCal-MPI with a Variety of Commonly Used Computational Methods. Mater. Today Commun. 2021, 27, 102226 10.1016/j.mtcomm.2021.102226
Colby, S. M.; Thomas, D. G.; Nuñez, J. R.; Baxter, D. J.; Glaesemann, K. R.; Brown, J. M.; Pirrung, M. A.; Govind, N.; Teeguarden, J. G.; Metz, T. O.; Renslow, R. S. ISiCLE: A Quantum Chemistry Pipeline for Establishing in Silico Collision Cross Section Libraries. Anal. Chem. 2019, 91 ( 7), 4346- 4356, 10.1021/acs.analchem.8b04567
Zhou, Z.; Luo, M.; Chen, X.; Yin, Y.; Xiong, X.; Wang, R.; Zhu, Z.-J. Ion Mobility Collision Cross-Section Atlas for Known and Unknown Metabolite Annotation in Untargeted Metabolomics. Nat. Commun. 2020, 11 ( 1), 4334, 10.1038/s41467-020-18171-8
Ross, D. H.; Cho, J. H.; Xu, L. Breaking Down Structural Diversity for Comprehensive Prediction of Ion-Neutral Collision Cross Sections. Anal. Chem. 2020, 92 ( 6), 4548- 4557, 10.1021/acs.analchem.9b05772
Plante, P.-L.; Francovic-Fontaine, É.; May, J. C.; McLean, J. A.; Baker, E. S.; Laviolette, F.; Marchand, M.; Corbeil, J. Predicting Ion Mobility Collision Cross-Sections Using a Deep Neural Network: DeepCCS. Anal. Chem. 2019, 91 ( 8), 5191- 5199, 10.1021/acs.analchem.8b05821
Rainey, M. A.; Watson, C. A.; Asef, C. K.; Foster, M. R.; Baker, E. S.; Fernández, F. M. CCS Predictor 2.0: An Open-Source Jupyter Notebook Tool for Filtering Out False Positives in Metabolomics. Anal. Chem. 2022, 94 ( 50), 17456- 17466, 10.1021/acs.analchem.2c03491
Wishart, D. S.; Guo, A.; Oler, E.; Wang, F.; Anjum, A.; Peters, H.; Dizon, R.; Sayeeda, Z.; Tian, S.; Lee, B. L.; Berjanskii, M.; Mah, R.; Yamamoto, M.; Jovel, J.; Torres-Calzada, C.; Hiebert-Giesbrecht, M.; Lui, V. W.; Varshavi, D.; Varshavi, D.; Allen, D. HMDB 5.0: The Human Metabolome Database for 2022. Nucleic Acids Res. 2022, 50 ( D1), D622- D631, 10.1093/nar/gkab1062
Williams, A. J.; Grulke, C. M.; Edwards, J.; McEachran, A. D.; Mansouri, K.; Baker, N. C.; Patlewicz, G.; Shah, I.; Wambaugh, J. F.; Judson, R. S.; Richard, A. M. The CompTox Chemistry Dashboard: A Community Data Resource for Environmental Chemistry. Journal of Cheminformatics 2017, 9 ( 1), 61, 10.1186/s13321-017-0247-6
Kim, S.; Chen, J.; Cheng, T.; Gindulyte, A.; He, J.; He, S.; Li, Q.; Shoemaker, B. A.; Thiessen, P. A.; Yu, B.; Zaslavsky, L.; Zhang, J.; Bolton, E. E. PubChem 2023 Update. Nucleic Acids Res. 2023, 51, D1373- D1380, 10.1093/nar/gkac956
Pence, H. E.; Williams, A. ChemSpider: An Online Chemical Information Resource. J. Chem. Educ. 2010, 87 ( 11), 1123- 1124, 10.1021/ed100697w
American Chemical Society . CAS REGISTRY - The CAS Substance Collection, 2024. https://www.cas.org/cas-data/cas-registry (accessed 2024-08-03).
Wang, Z.; Walker, G. W.; Muir, D. C. G.; Nagatani-Yoshida, K. Toward a Global Understanding of Chemical Pollution: A First Comprehensive Analysis of National and Regional Chemical Inventories. Environ. Sci. Technol. 2020, 54 ( 5), 2575- 2584, 10.1021/acs.est.9b06379
Mohammed Taha, H.; Aalizadeh, R.; Alygizakis, N.; Antignac, J.-P.; Arp, H. P. H.; Bade, R.; Baker, N.; Belova, L.; Bijlsma, L.; Bolton, E. E.; Brack, W.; Celma, A.; Chen, W.-L.; Cheng, T.; Chirsir, P.; Čirka, L.; D’Agostino, L. A.; Djoumbou Feunang, Y.; Dulio, V.; Fischer, S. The NORMAN Suspect List Exchange (NORMAN-SLE): Facilitating European and Worldwide Collaboration on Suspect Screening in High Resolution Mass Spectrometry. Environmental Sciences Europe 2022, 34 ( 1), 104, 10.1186/s12302-022-00680-6
Schymanski, E. L.; Kondić, T.; Neumann, S.; Thiessen, P. A.; Zhang, J.; Bolton, E. E. Empowering Large Chemical Knowledge Bases for Exposomics: PubChemLite Meets MetFrag. Journal of Cheminformatics 2021, 13 ( 1), 19, 10.1186/s13321-021-00489-0
Helmus, R.; ter Laak, T. L.; van Wezel, A. P.; de Voogt, P.; Schymanski, E. L. patRoon: Open Source Software Platform for Environmental Mass Spectrometry Based Non-Target Screening. Journal of Cheminformatics 2021, 13 ( 1), 1, 10.1186/s13321-020-00477-w
Ruttkies, C.; Schymanski, E. L.; Wolf, S.; Hollender, J.; Neumann, S. MetFrag Relaunched: Incorporating Strategies Beyond in Silico Fragmentation. Journal of Cheminformatics 2016, 8 ( 1), 3, 10.1186/s13321-016-0115-9
Kirkwood, K. I.; Christopher, M. W.; Burgess, J. L.; Littau, S. R.; Foster, K.; Richey, K.; Pratt, B. S.; Shulman, N.; Tamura, K.; MacCoss, M. J.; MacLean, B. X.; Baker, E. S. Development and Application of Multidimensional Lipid Libraries to Investigate Lipidomic Dysregulation Related to Smoke Inhalation Injury Severity. J. Proteome Res. 2022, 21 ( 1), 232- 242, 10.1021/acs.jproteome.1c00820
Foster, M.; Rainey, M.; Watson, C.; Dodds, J. N.; Kirkwood, K. I.; Fernández, F. M.; Baker, E. S. Uncovering PFAS and Other Xenobiotics in the Dark Metabolome Using Ion Mobility Spectrometry, Mass Defect Analysis, and Machine Learning. Environ. Sci. Technol. 2022, 56 ( 12), 9133- 9143, 10.1021/acs.est.2c00201
Picache, J. A.; Rose, B. S.; Balinski, A.; Leaptrot, K. L.; Sherrod, S. D.; May, J. C.; McLean, J. A. Collision Cross Section Compendium to Annotate and Predict Multi-Omic Compound Identities. Chemical Science 2019, 10 ( 4), 983- 993, 10.1039/C8SC04396E
Picache, J.; McLean, J. S50 CCSCOMPEND The Unified Collision Cross Section (CCS) Compendium. Zenodo 2019, 10.5281/zenodo.2658162
Celma, A.; Sancho, J. V.; Schymanski, E. L.; Fabregat-Safont, D.; Ibáñez, M.; Goshawk, J.; Barknowitz, G.; Hernández, F.; Bijlsma, L. Improving Target and Suspect Screening High-Resolution Mass Spectrometry Workflows in Environmental Analysis by Ion Mobility Separation. Environ. Sci. Technol. 2020, 54 ( 23), 15120- 15131, 10.1021/acs.est.0c05713
Celma, A.; Fabregat-Safont, D.; Ibàñez, M.; Bijlsma, L.; Hernandez, F.; Sancho, J. V. S61 UJICCSLIB Collision Cross Section (CCS) Library from UJI. Zenodo 2019, 10.5281/zenodo.3549476
Belova, L.; Caballero-Casero, N.; Nuijs, A. L. N. van; Covaci, A. S79 UACCSCEC Collision Cross Section (CCS) Library from UAntwerp. Zenodo 2021, 10.5281/zenodo.4704648
Muller, H.; Palm, E.; Schymanski, E. S116 REFCCS Collision Cross Section (CCS) Values from Literature. Zenodo 2024, 10.5281/zenodo.10932895
Aurich, D.; Schymanski, E. L.; De Jesus Matias, F.; Thiessen, P. A.; Pang, J. Revealing Chemical Trends: Insights from Data-Driven Visualization and Patent Analysis in Exposomics Research. Environ. Sci. Technol. Lett. 2024, 11 ( 10), 1046- 1052, 10.1021/acs.estlett.4c00560
Arp, H. P. H.; Aurich, D.; Schymanski, E. L.; Sims, K.; Hale, S. E. Avoiding the Next Silent Spring: Our Chemical Past, Present, and Future. Environ. Sci. Technol. 2023, 57 ( 16), 6355- 6359, 10.1021/acs.est.3c01735
Aurich, D. Uniluxembourg/LCSB/Environmental Cheminformatics/Chemicalstripes. GitLab, 2024. https://gitlab.com/uniluxembourg/lcsb/eci/chemicalstripes (accessed 2024-08-04).
Talavera Andújar, B.; Mary, A.; Venegas, C.; Cheng, T.; Zaslavsky, L.; Bolton, E. E.; Heneka, M. T.; Schymanski, E. L. Can Small Molecules Provide Clues on Disease Progression in Cerebrospinal Fluid from Mild Cognitive Impairment and Alzheimer’s Disease Patients?. Environ. Sci. Technol. 2024, 58, 4181- 4192, 10.1021/acs.est.3c10490
Menger, F.; Celma, A.; Schymanski, E. L.; Lai, F. Y.; Bijlsma, L.; Wiberg, K.; Hernández, F.; Sancho, J. V.; Ahrens, L. Enhancing Spectral Quality in Complex Environmental Matrices: Supporting Suspect and Non-Target Screening in Zebra Mussels with Ion Mobility. Environ. Int. 2022, 170, 107585 10.1016/j.envint.2022.107585
Baker, E. S.; Hoang, C.; Uritboonthai, W.; Heyman, H. M.; Pratt, B.; MacCoss, M.; MacLean, B.; Plumb, R.; Aisporna, A.; Siuzdak, G. METLIN-CCS: An Ion Mobility Spectrometry Collision Cross Section Database. Nat. Methods 2023, 20 ( 12), 1836- 1837, 10.1038/s41592-023-02078-5
Baker, E. S.; Uritboonthai, W.; Aisporna, A.; Hoang, C.; Heyman, H. M.; Connell, L.; Olivier-Jimenez, D.; Giera, M.; Siuzdak, G. METLIN-CCS Lipid Database: An Authentic Standards Resource for Lipid Classification and Identification. Nature Metabolism 2024, 6 ( 6), 981- 982, 10.1038/s42255-024-01058-z