Chemical Stripes; PubChem; Cheminformatics; Data Visualization; Exposomics; Early Warning System; Patent Analysis
Abstract :
[en] Understanding historical chemical usage is crucial for assessing current and past impacts on human health and the environment and for informing future regulatory decisions. However, past monitoring data are often limited in scope and number of chemicals, while suitable sample types are not always available for remeasurement. Data-driven cheminformatics methods for patent and literature data offer several opportunities to fill this gap. The chemical stripes were developed as an interactive, open source tool for visualizing patent and literature trends over time, inspired by the global warming and biodiversity stripes. This paper details the underlying code and data sets behind the visualization, with a major focus on the patent data sourced from PubChem, including patent origins, uses, and countries. Overall trends and specific examples are investigated in greater detail to explore both the promise and caveats that such data offer in assessing the trends and patterns of chemical patents over time and across different geographic regions. Despite a number of potential artifacts associated with patent data extraction, the integration of cheminformatics, statistical analysis, and data visualization tools can help generate valuable insights that can both illuminate the chemical past and potentially serve toward an early warning system for the future.
Research center :
Luxembourg Centre for Systems Biomedicine (LCSB): Environmental Cheminformatics (Schymanski Group) National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, Maryland 20894, United States Faculty of Science, Technology and Medicine (FSTM), University of Luxembourg, 6 Avenue de la Fonte, L-4364 Esch-sur-Alzette, Luxembourg
de Jesus Matias, Flavio ; Luxembourg Centre for Systems Biomedicine (LCSB), University of Luxembourg, 6 Avenue du Swing, Belvaux L-4367, Luxembourg ; Faculty of Science, Technology and Medicine (FSTM), University of Luxembourg, 6 Avenue de la Fonte, L-4364 Esch-sur-Alzette, Luxembourg
Thiessen, Paul A. ; National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, Maryland 20894, United States
FNR12341006 - Environmental Cheminformatics To Identify Unknown Chemicals And Their Effects, 2018 (01/10/2018-30/09/2023) - Emma Schymanski
Name of the research project :
R-AGR-3703 - IAS - LuxTIME - FICKERS Andreas
Funders :
Fonds National de la Recherche Luxembourg Luxembourg Institute for Advanced Studies (IAS) National Center for Biotechnology Information of the National Library of Medicine (NLM)
Ruttkies, C.; Schymanski, E. L.; Wolf, S.; Hollender, J.; Neumann, S. MetFrag Relaunched: Incorporating Strategies beyond in Silico Fragmentation. J. Cheminformatics 2016, 8 ( 1), 3, 10.1186/s13321-016-0115-9
Arp, H. P. H.; Aurich, D.; Schymanski, E. L.; Sims, K.; Hale, S. E. Avoiding the Next Silent Spring: Our Chemical Past, Present, and Future. Environ. Sci. Technol. 2023, 57 ( 16), 6355- 6359, 10.1021/acs.est.3c01735
Perera, J. Our Chemical Past, Present and Future; Soundcloud, 2024; https://soundcloud.com/jamieperera/our-chemical-past-present-and-future.
Perera, J. Our Chemical Past, Present and Future; vimeo, 2024; https://vimeo.com/jpmlmusic/ourchemicalpastpresentandfuture (accessed 2024-06-18).
Aurich, D.; Schymanski, E. L.; Thiessen, P. A. GitLab Repository “Environmental Cheminformatics/chemicalstripes”; GitLab, 2023; https://gitlab.com/uniluxembourg/lcsb/eci/chemicalstripes (accessed 2023-06-12).
Search and Tools; PubChem, Documentation; NCBI/NLM/NIH, 2024; https://pubchem.ncbi.nlm.nih.gov/docs/search-and-tools (accessed 2024-08-19).
Richardson, M. Biodiversity Stripes-A Journey from Green to Grey. Finding Nature; Living Planet Index, 2022; https://findingnature.org.uk/2022/08/10/biodiversity-stripes/(accessed 2022-12-14).
Wickham, H. Use R! Ggplot2: Elegant Graphics for Data Analysis, 2nd ed.; Springer International: Cham, 2016; https://doi.org/10.1007/978-3-319-24277-4.
Mayfield, J. CDK Depict Web Interface; https://www.simolecule.com/cdkdepict/depict.html (accessed 2023-03-09).
de Jesus Matias, F. ULPatentTrends. GitLab; Uniluxembourg/LCSB/Environmental Cheminformatics, 2024; https://gitlab.com/uniluxembourg/lcsb/eci/ULPatentTrends (accessed 2024-07-08).
Mohammed Taha, H.; Aalizadeh, R.; Alygizakis, N.; Antignac, J.-P.; Arp, H. P. H.; Bade, R.; Baker, N.; Belova, L.; Bijlsma, L.; Bolton, E. E.; Brack, W.; Celma, A.; Chen, W.-L.; Cheng, T.; Chirsir, P.; Čirka, Ł.; D’Agostino, L. A.; Djoumbou Feunang, Y.; Dulio, V.; Fischer, S. The NORMAN Suspect List Exchange (NORMAN-SLE): Facilitating European and Worldwide Collaboration on Suspect Screening in High Resolution Mass Spectrometry. Environ. Sci. Eur. 2022, 34 ( 1), 104, 10.1186/s12302-022-00680-6
NORMAN Suspect List Exchange Tree (PubChem NORMAN-SLE Tree). PubChem Classification Browser; NORMAN Association, NCBI/NLM/NIH, 2024; https://pubchem.ncbi.nlm.nih.gov/classification/#hid=101 (accessed 2024-07-08).
NORMAN Suspect List Exchange (NORMAN-SLE) Website; NORMAN Association, 2024; https://www.norman-network.com/nds/SLE/ (accessed 2024-07-08).
CompTox Chemicals Dashboard: Chemical Lists Page: US Environmental Protection Agency, 2024; https://comptox.epa.gov/dashboard/chemical-lists (accessed 2024-07-08).
Williams, A. J.; Grulke, C. M.; Edwards, J.; McEachran, A. D.; Mansouri, K.; Baker, N. C.; Patlewicz, G.; Shah, I.; Wambaugh, J. F.; Judson, R. S.; Richard, A. M. The CompTox Chemistry Dashboard: A Community Data Resource for Environmental Chemistry. J. Cheminformatics 2017, 9 ( 1), 61, 10.1186/s13321-017-0247-6
PubChem Classification Browser: EPA DSSTox Tree (PubChe. CompTox Chemicals Dashboard Chemical Lists Tree); NCBI/NLM/NIH/US EPA, 2024; https://pubchem.ncbi.nlm.nih.gov/classification/#hid=105.
Schymanski, E. L.; Zhang, J.; Thiessen, P. A.; Chirsir, P.; Kondic, T.; Bolton, E. E. Per- and Polyfluoroalkyl Substances (PFAS) in PubChem: 7 Million and Growing. Environ. Sci. Technol. 2023, 57 ( 44), 16918- 16928, 10.1021/acs.est.3c04855
PubChem Classification Browser: PFAS and Fluorinated Compounds in PubChem Tree; LCSB-ECI, NCBI/NLM/NIH, 2024; https://pubchem.ncbi.nlm.nih.gov/classification/#hid=120.
Rüdel, H. S28 | EUBIOCIDES | Biocides from the NORMAN Priority List. Zenodo; 2018; https://doi.org/10.5281/zenodo.2648820 (accessed 2024-08-26).
German Environment Agency (UBA). S97 | UBABPAALT | List of Bisphenol A Alternatives from UBA. Zenodo, 2022; https://doi.org/10.5281/zenodo.6405325 (accessed 2024-08-26).
PARC Collaborators . Partnership for the Assessment of Risks from Chemicals; Parc; Parc, 2024; https://www.eu-parc.eu/ (accessed 2024-08-19).
Marx-Stoelting, P.; Rivière, G.; Luijten, M.; Aiello-Holden, K.; Bandow, N.; Baken, K.; Cañas, A.; Castano, A.; Denys, S.; Fillol, C.; Herzler, M.; Iavicoli, I.; Karakitsios, S.; Klanova, J.; Kolossa-Gehring, M.; Koutsodimou, A.; Vicente, J. L.; Lynch, I.; Namorado, S.; Norager, S.; Pittman, A.; Rotter, S.; Sarigiannis, D.; Silva, M. J.; Theunis, J.; Tralau, T.; Uhl, M.; Van Klaveren, J.; Wendt-Rasch, L.; Westerholm, E.; Rousselle, C.; Sanders, P. A Walk in the PARC: Developing and Implementing 21st Century Chemical Risk Assessment in Europe. Arch. Toxicol. 2023, 97 ( 3), 893- 908, 10.1007/s00204-022-03435-7
Polesello, S.; Valsecchi, S. S102 | PARCPFAS | List of PFAS from PARC WP4. Zenodo, 2023; https://doi.org/10.5281/zenodo.10252414.
Schymanski, E. S111 | PMTPFAS | Fluorine-Containing Compounds in PMT Suspect Lists. Zenodo, 2023; NORMAN-SLE-S111.0.1.0, https://doi.org/10.5281/zenodo.8417075.
United Nations Updated Indicative List of Substances Covered by the Listing of Perfluorooctanoic Acid (PFOA), Its Salts and PFOA-Related Compounds; Stockholm Convention on Persistent Organic Pollutants, 2024; Persistent Organic Pollutants Review Committee Seventeenth meeting UNEP/POPS/POPRC.17/INF/14/Rev.1; Geneva, 2022; p 57 https://chm.pops.int/TheConvention/POPsReviewCommittee/Meetings/POPRC17/Overview/tabid/8900/Default.aspx (accessed 2023-06-11).
United Nations . Draft Decision SC-10/[-]: Listing of Perfluorohexane Sulfonic Acid (PFHxS), Its Salts and PFHxS-Related Compounds; Persistent Organic Pollutants Review Committee Tenth meeting UNEP/POPS/COP.10/CRP.10; Stockholm Convention on Persistent Organic Pollutants, 2021; 2021; p 1; https://www.pops.int/TheConvention/POPsReviewCommittee/Meetings/POPRC10/Overview/tabid/3779/mctl/ViewDetails/EventModID/871/EventID/514/xmid/11873/Default.aspx (accessed 2023-06-10).
United Nations . Proposal to List Long-Chain Perfluorocarboxylic Acids, Their Salts and Related Compounds in Annexes A, B and/or C to the Stockholm Convention on Persistent Organic Pollutants. Persistent Organic Pollutants Review Committee Seventeenth Meeting UNEP/POPS/POPRC.17/7; Stockholm Convention on Persistent Organic Pollutants, United Nations: Geneva, 2021; p 24; https://www.pops.int/TheConvention/POPsReviewCommittee/Meetings/POPRC17/Overview/tabid/8900/Default.aspx (accessed 2023-06-10).
United Nations . Draft Indicative List of Long-Chain Perfluorocarboxylic Acids, Their Salts and Related Compounds; Stockholm Convention on Persistent Organic Pollutants. Persistent Organic Pollutants Review Committee Eighteenth Meeting; UNEP/POPS/POPRC.18/INF/14: Rome, 2022; p 24; https://www.pops.int/tabid/9165 (accessed 2023-06-10).
United Nations . Draft Risk Profile: Long-Chain Perfluorocarboxylic Acids, Their Salts and Related Compounds; Persistent Organic Pollutants Review Committee Eighteenth Meeting; Stockholm Convention on Persistent Organic Pollutants; UNEP/POPS/POPRC.18/6/Add.1*: Rome, 2022; p 56; https://www.pops.int/tabid/9165 (accessed 2023-06-10).
Morin, L.; Weber, V.; Meijer, G. I.; Yu, F.; Staar, P. W. J. PatCID: An Open-Access Dataset of Chemical Structures in Patent Documents. Nat. Commun. 2024, 15 ( 1), 6532, 10.1038/s41467-024-50779-y
Rajan, K.; Zielesny, A.; Steinbeck, C. DECIMER: Towards Deep Learning for Chemical Image Recognition. J. Cheminformatics 2020, 12 ( 1), 65, 10.1186/s13321-020-00469-w
Rajan, K.; Brinkhaus, H. O.; Sorokina, M.; Zielesny, A.; Steinbeck, C. DECIMER-Segmentation: Automated Extraction of Chemical Structure Depictions from Scientific Literature. J. Cheminformatics 2021, 13 ( 1), 20, 10.1186/s13321-021-00496-1
Barnabas, S. J.; Böhme, T.; Boyer, S. K.; Irmer, M.; Ruttkies, C.; Wetherbee, I.; Kondić, T.; Schymanski, E. L.; Weber, L. Extraction of Chemical Structures from Literature and Patent Documents Using Open Access Chemistry Toolkits: A Case Study with PFAS. Digit. Discovery 2022, 1 ( 4), 490- 501, 10.1039/D2DD00019A
Kosonocky, C. W.; Wilke, C. O.; Marcotte, E. M.; Ellington, A. D. Mining Patents with Large Language Models Elucidates the Chemical Function Landscape. Digit. Discovery 2024, 3 ( 6), 1150- 1159, 10.1039/D4DD00011K