![]() Schymanski, Emma ![]() Presentation (2023, March 22) ZeroPM Webinar: \textlessstrong\textgreater\textlessem\textgreaterAre there really 6 million PFAS in PubChem?\textless/em\textgreater\textless/strong\textgreater The increasing concerns about poly and ... [more ▼] ZeroPM Webinar: \textlessstrong\textgreater\textlessem\textgreaterAre there really 6 million PFAS in PubChem?\textless/em\textgreater\textless/strong\textgreater The increasing concerns about poly and perfluoroalkyl substances (PFAS) and calls for action upon them as a class has spurred intense debates on how to define and enumerate the “PFAS Chemical Space”. There are now \>50 PFAS lists openly available, including the OECD PFAS list of \textasciitilde4700 PFAS (ENV/JM/MONO(2018)7) and the US EPA PFASMASTER list of \>12000 PFAS. However, searching the large open chemical collection PubChem (114 million chemicals, Feb. 2023) reveals that \textlessstrong\textgreater\textlessem\textgreater\>6 million entries\textless/em\textgreater\textless/strong\textgreater match the latest OECD PFAS definition where PFAS “contains at least one alkyl CF$_\textrm2$ group” (ENV/CBC/MONO(2021)25). This webinar will introduce listeners to the new classification browser in PubChem designed to help navigate these incredible numbers, the “PFAS and Fluorinated Compounds in PubChem Tree” (“PubChem PFAS Tree” for short). The current version contains six main sections: OECD PFAS definition (\>6 million PFAS), organofluorine compounds (\>19 million compounds), other diverse fluorinated compounds, OECD PFAS by chemistry (\>7 million PFAS including salts and mixtures), several PFAS collections (from CompTox, NORMAN-SLE, NIST, OntoChem and PubChem) and finally regulatory collections. We will walk listeners through the PubChem PFAS Tree and the many features it offers to help users explore the PFAS space in PubChem and look forward to lively discussions with the audience afterwards. [less ▲] Detailed reference viewed: 11 (0 UL)![]() Schymanski, Emma ![]() Presentation (2023, March 17) Invited talk for the Environmental Chemistry and Biogeochemistry Seminar at Umeå University, 17 March 2023, Virtual Event. Many thanks to Andriy Rebryk for the invitation! Detailed reference viewed: 11 (0 UL)![]() Schymanski, Emma ![]() in Analytical Scientist (2023) Why Open and FAIR data sharing in analytical research is important for public data availability, raising awareness of your data, and the very future of analytical science – according to Emma Schymanski Detailed reference viewed: 50 (1 UL)![]() ; ; et al in TrAC: Trends in Analytical Chemistry (2023), 159 Non-target screening (NTS) methods are rapidly gaining in popularity, empowering researchers to search for an ever-increasing number of chemicals. Given this possibility, communicating the confidence of ... [more ▼] Non-target screening (NTS) methods are rapidly gaining in popularity, empowering researchers to search for an ever-increasing number of chemicals. Given this possibility, communicating the confidence of identification in an automated, concise and unambiguous manner is becoming increasingly important. In this study, we compiled several pieces of evidence necessary for communicating NTS identification confidence and developed a machine learning approach for classification of the identifications as reliable and unreliable. The machine learning approach was trained using data generated by four laboratories equipped with different instrumentation. The model discarded substances with insufficient identification evidence efficiently, while revealing the relevance of different parameters for identification. Based on these results, a harmonized IP-based system is proposed. This new NTS-oriented system is compatible with the currently widely used five level system. It increases the precision in reporting and the reproducibility of current approaches via the inclusion of evidence scores, while being suitable for automation. [less ▲] Detailed reference viewed: 50 (1 UL)![]() Schymanski, Emma ![]() ![]() in Nature Water (2023), 1(1), 4--6 Since water is a common good, the outcome of water-related research should be accessible to everyone. Since Open Science is more than just open access research articles, journals must work with the ... [more ▼] Since water is a common good, the outcome of water-related research should be accessible to everyone. Since Open Science is more than just open access research articles, journals must work with the research community to enable fully open and FAIR science [less ▲] Detailed reference viewed: 35 (3 UL)![]() Lai, Adelene ![]() in Journal of Cheminformatics (2022), 14(85), Homologous series are groups of related compounds that share the same core structure attached to a motif that repeats to different degrees. Compounds forming homologous series are of interest in multiple ... [more ▼] Homologous series are groups of related compounds that share the same core structure attached to a motif that repeats to different degrees. Compounds forming homologous series are of interest in multiple domains, including natural products, environmental chemistry, and drug design. However, many homologous compounds remain unannotated as such in compound datasets, which poses obstacles to understanding chemical diversity and their analytical identification via database matching. To overcome these challenges, an algorithm to detect homologous series within compound datasets was developed and implemented using the RDKit. The algorithm takes a list of molecules as SMILES strings and a monomer (i.e., repeating unit) encoded as SMARTS as its main inputs. In an iterative process, substructure matching of repeating units, molecule fragmentation, and core detection lead to homologous series classification through grouping of identical cores. Three open compound datasets from environmental chemistry (NORMAN Suspect List Exchange, NORMAN-SLE), exposomics (PubChemLite for Exposomics), and natural products (the COlleCtion of Open NatUral producTs, COCONUT) were subject to homologous series classification using the algorithm. Over 2000, 12,000, and 5000 series with CH2 repeating units were classified in the NORMAN-SLE, PubChemLite, and COCONUT respectively. Validation of classified series was performed using published homologous series and structure categories, including a comparison with a similar existing method for categorising PFAS compounds. The OngLai algorithm and its implementation for classifying homologues are openly available at: https://github.com/adelenelai/onglai-classify-homologues. [less ▲] Detailed reference viewed: 15 (0 UL)![]() Talavera Andujar, Begona ![]() ![]() ![]() in Analytical and Bioanalytical Chemistry (2022) Parkinson’s disease (PD) is the second most prevalent neurodegenerative disease, with an increasing incidence in recent years due to the ageing population. Genetic mutations alone only explain <10% of PD ... [more ▼] Parkinson’s disease (PD) is the second most prevalent neurodegenerative disease, with an increasing incidence in recent years due to the ageing population. Genetic mutations alone only explain <10% of PD cases, while environmental factors, including small molecules, may play a significant role in PD. In the present work, 22 plasma (11 PD, 11 control) and 19 feces samples (10 PD, 9 control) were analyzed by non-target high resolution mass spectrometry (NT-HRMS) coupled to two liquid chromatography (LC) methods (reversed phase (RP) and hydrophilic interaction liquid chromatography (HILIC)). A cheminformatics workflow was optimized using open software (MS-DIAL and patRoon) and open databases (all public MSP-formatted spectral libraries for MS-DIAL, PubChemLite for Exposomics and the LITMINEDNEURO list for patRoon). Furthermore, five disease-specific databases and three suspect lists (on PD and related disorders) were developed, using PubChem functionality to identifying relevant unknown chemicals. The results showed that non-target screening with the larger databases generally provided better results compared with smaller suspect lists. However, two suspect screening approaches with patRoon were also good options to study specific chemicals in PD. The combination of chromatographic methods (RP and HILIC) as well as two ionization modes (positive and negative) enhanced the coverage of chemicals in the biological samples. While most metabolomics studies in PD have focused on blood and cerebrospinal fluid, we found a higher number of relevant features in feces, such as alanine betaine or nicotinamide, which can be directly metabolized by gut microbiota. This highlights the potential role of gut dysbiosis in PD development. [less ▲] Detailed reference viewed: 84 (1 UL)![]() ; ; et al in Environmental Science Technology Letters (2022), 0(0), Detailed reference viewed: 43 (4 UL)![]() Lai, Adelene ![]() in Environmental Science and Technology (2022) Substances of unknown or variable composition, complex reaction products, or biological materials (UVCBs) are over 70 000 “complex” chemical mixtures produced and used at significant levels worldwide. Due ... [more ▼] Substances of unknown or variable composition, complex reaction products, or biological materials (UVCBs) are over 70 000 “complex” chemical mixtures produced and used at significant levels worldwide. Due to their unknown or variable composition, applying chemical assessments originally developed for individual compounds to UVCBs is challenging, which impedes sound management of these substances. Across the analytical sciences, toxicology, cheminformatics, and regulatory practice, new approaches addressing specific aspects of UVCB assessment are being developed, albeit in a fragmented manner. This review attempts to convey the “big picture” of the state of the art in dealing with UVCBs by holistically examining UVCB characterization and chemical identity representation, as well as hazard, exposure, and risk assessment. Overall, information gaps on chemical identities underpin the fundamental challenges concerning UVCBs, and better reporting and substance characterization efforts are needed to support subsequent chemical assessments. To this end, an information level scheme for improved UVCB data collection and management within databases is proposed. The development of UVCB testing shows early progress, in line with three main methods: whole substance, known constituents, and fraction profiling. For toxicity assessment, one option is a whole-mixture testing approach. If the identities of (many) constituents are known, grouping, read across, and mixture toxicity modeling represent complementary approaches to overcome data gaps in toxicity assessment. This review highlights continued needs for concerted efforts from all stakeholders to ensure proper assessment and sound management of UVCBs. [less ▲] Detailed reference viewed: 51 (3 UL)![]() Frigerio, Gianfranco ![]() in Molecules (2022), 27(8), 2580 Pooled quality controls (QCs) are usually implemented within untargeted methods to improve the quality of datasets by removing features either not detected or not reproducible. However, this approach can ... [more ▼] Pooled quality controls (QCs) are usually implemented within untargeted methods to improve the quality of datasets by removing features either not detected or not reproducible. However, this approach can be limiting in exposomics studies conducted on groups of exposed and nonexposed subjects, as compounds present at low levels only in exposed subjects can be diluted and thus not detected in the pooled QC. The aim of this work is to develop and apply an untargeted workflow for human biomonitoring in urine samples, implementing a novel separated approach for preparing pooled quality controls. An LC-MS/MS workflow was developed and applied to a case study of smoking and non-smoking subjects. Three different pooled quality controls were prepared: mixing an aliquot from every sample (QC-T), only from non-smokers (QC-NS), and only from smokers (QC-S). The feature tables were filtered using QC-T (T-feature list), QC-S, and QC-NS, separately. The last two feature lists were merged (SNS-feature list). A higher number of features was obtained with the SNS-feature list than the T-feature list, resulting in identification of a higher number of biologically significant compounds. The separated pooled QC strategy implemented can improve the nontargeted human biomonitoring for groups of exposed and nonexposed subjects. [less ▲] Detailed reference viewed: 52 (5 UL)![]() Schymanski, Emma ![]() Presentation (2022, January 10) The multitude of chemicals to which we are exposed is ever increasing, with over 110 million chemicals in the largest open chemical databases, over 350,000 in global use inventories, and over 70,000 ... [more ▼] The multitude of chemicals to which we are exposed is ever increasing, with over 110 million chemicals in the largest open chemical databases, over 350,000 in global use inventories, and over 70,000 estimated to be in household use alone. Detectable molecules in exposomics can be captured using non-target high resolution mass spectrometry (HRMS), but despite the size of the chemical space, scientists cannot yet identify most of the tens of thousands of features in each sample, leading to critical bottlenecks in identification and data interpretation. This talk will cover European and worldwide community initiatives and resources to help connect environmental expert knowledge and observations towards a better understanding of the exposome, including various open cheminformatics and computational mass spectrometry approaches such as the NORMAN Suspect List Exchange, MassBank, MetFrag and PubChemLite for Exposomics. [less ▲] Detailed reference viewed: 117 (4 UL)![]() ; Aho, Velma ![]() ![]() E-print/Working paper (2022) Patients with Parkinson’s disease (PD) exhibit differences in their gut microbiomes compared to healthy individuals. Although differences have most commonly been described in the abundances of bacterial ... [more ▼] Patients with Parkinson’s disease (PD) exhibit differences in their gut microbiomes compared to healthy individuals. Although differences have most commonly been described in the abundances of bacterial taxa, changes to viral and archaeal populations have also been observed. Mechanistic links between gut microbes and PD pathogenesis remain elusive but could involve molecules that promote α-synuclein aggregation. Here, we show that 2-hydroxypyridine (2-HP) represents a key molecule for the pathogenesis of PD. We observe significantly elevated 2-HP levels in faecal samples from patients with PD or its prodrome, idiopathic REM sleep behaviour disorder (iRBD), compared to healthy controls. 2-HP is correlated with the archaeal species Methanobrevibacter smithii and with genes involved in methane metabolism, and it is detectable in isolate cultures of M. smithii. We demonstrate that 2-HP is selectively toxic to transgenic α-synuclein overexpressing yeast and increases α-synuclein aggregation in a yeast model as well as in human induced pluripotent stem cell derived enteric neurons. It also exacerbates PD-related motor symptoms, α-synuclein aggregation, and striatal degeneration when injected intrastriatally in transgenic mice overexpressing human α-synuclein. Our results highlight the effect of an archaeal molecule in relation to the gut-brain axis, which is critical for the diagnosis, prognosis, and treatment of PD. [less ▲] Detailed reference viewed: 109 (6 UL)![]() ; ; et al Report (2022) Abstract Despite the increasing availability of tandem mass spectrometry (MS/MS) community spectral libraries for untargeted metabolomics over the past decade, the majority of acquired MS/MS spectra ... [more ▼] Abstract Despite the increasing availability of tandem mass spectrometry (MS/MS) community spectral libraries for untargeted metabolomics over the past decade, the majority of acquired MS/MS spectra remain uninterpreted. To further aid in interpreting unannotated spectra, we created a nearest neighbor suspect spectral library, consisting of 87,916 annotated MS/MS spectra derived from hundreds of millions of public MS/MS spectra. Annotations were propagated based on structural relationships to reference molecules using MS/MS-based spectrum alignment. We demonstrate the broad relevance of the nearest neighbor suspect spectral library through representative examples of propagation-based annotation of acylcarnitines, bacterial and plant natural products, and drug metabolism. Our results also highlight how the library can help to better understand an Alzheimer’s brain phenotype. The nearest neighbor suspect spectral library is openly available through the GNPS platform to help investigators hypothesize candidate structures for unknown MS/MS spectra in untargeted metabolomics data. [less ▲] Detailed reference viewed: 26 (0 UL)![]() ; Schymanski, Emma ![]() ![]() in Medizinische Genetik (2022), 34(2), 103--116 Detailed reference viewed: 51 (3 UL)![]() ; Schymanski, Emma ![]() in Nature Machine Intelligence (2022), 4(12), 1224--1237 Abstract Structural annotation of small molecules in biological samples remains a key bottleneck in untargeted metabolomics, despite rapid progress in predictive methods and tools during the past decade ... [more ▼] Abstract Structural annotation of small molecules in biological samples remains a key bottleneck in untargeted metabolomics, despite rapid progress in predictive methods and tools during the past decade. Liquid chromatography–tandem mass spectrometry, one of the most widely used analysis platforms, can detect thousands of molecules in a sample, the vast majority of which remain unidentified even with best-of-class methods. Here we present LC-MS2Struct, a machine learning framework for structural annotation of small-molecule data arising from liquid chromatography–tandem mass spectrometry (LC-MS2) measurements. LC-MS2Struct jointly predicts the annotations for a set of mass spectrometry features in a sample, using a novel structured prediction model trained to optimally combine the output of state-of-the-art MS2 scorers and observed retention orders. We evaluate our method on a dataset covering all publicly available reversed-phase LC-MS2 data in the MassBank reference database, including 4,327 molecules measured using 18 different LC conditions from 16 contributors, greatly expanding the chemical analytical space covered in previous multi-MSscorer evaluations. LC-MS2Struct obtains significantly higher annotation accuracy than earlier methods and improves the annotation accuracy of state-of-the-art MS2 scorers by up to 106\%. The use of stereochemistry-aware molecular fingerprints improves prediction performance, which highlights limitations in existing approaches and has strong implications for future computational LC-MS2 developments. [less ▲] Detailed reference viewed: 26 (1 UL)![]() ; ; et al in Digital Discovery (2022) Extracting PFAS with open source cheminformatics toolkits reveals ~1.78 million PFAS in Google Patents, ~28 K in the CORE literature repository. The extraction of chemical information from documents is a ... [more ▼] Extracting PFAS with open source cheminformatics toolkits reveals ~1.78 million PFAS in Google Patents, ~28 K in the CORE literature repository. The extraction of chemical information from documents is a demanding task in cheminformatics due to the variety of text and image-based representations of chemistry. The present work describes the extraction of chemical compounds with unique chemical structures from the open access CORE (COnnecting REpositories) and Google Patents full text document repositories. The importance of structure normalization is demonstrated using three open access cheminformatics toolkits: the Chemistry Development Kit (CDK), RDKit and OpenChemLib (OCL). Each toolkit was used for structure parsing, normalization and subsequent substructure searching, using SMILES as structure representations of chemical molecules and International Chemical Identifiers (InChIs) for comparison. Per- and polyfluoroalkyl substances (PFAS) were chosen as a case study to perform the substructure search, due to their high environmental relevance, their presence in both literature and patent corpuses, and the current lack of community consensus on their definition. Three different structural definitions of PFAS were chosen to highlight the implications of various definitions from a cheminformatics perspective. Since CDK, RDKit and OCL implement different criteria and methods for SMILES parsing and normalization, different numbers of parsed compounds were extracted, which were then evaluated using the three PFAS definitions. A comparison of these toolkits and definitions is provided, along with a discussion of the implications for PFAS screening and text mining efforts in cheminformatics. Finally, the extracted PFAS (~1.7 M PFAS from patents and ~27 K from CORE) were compared against various existing PFAS lists and are provided in various formats for further community research efforts. [less ▲] Detailed reference viewed: 36 (1 UL)![]() ; Schymanski, Emma ![]() in Journal of Cheminformatics (2022), 14(1), 51 Detailed reference viewed: 175 (0 UL)![]() ; ; Schymanski, Emma ![]() in Environment International (2022), 170 Identification of bioaccumulating contaminants of emerging concern (CECs) via suspect and non-target screening remains a challenging task. In this study, ion mobility separation with high-resolution mass ... [more ▼] Identification of bioaccumulating contaminants of emerging concern (CECs) via suspect and non-target screening remains a challenging task. In this study, ion mobility separation with high-resolution mass spectrometry (IM-HRMS) was used to investigate the effects of drift time (DT) alignment on spectrum quality and peak annotation for screening of CECs in complex sample matrices using data independent acquisition (DIA). Data treatment approaches (Binary Sample Comparison) and prioritisation strategies (Halogen Match, co-occurrence of features in biota and the water phase) were explored in a case study on zebra mussel (Dreissena polymorpha) in Lake Mälaren, Sweden’s largest drinking water reservoir. DT alignment evidently improved the fragment spectrum quality by increasing the similarity score to reference spectra from on average (±standard deviation) 0.33 ± 0.31 to 0.64 ± 0.30 points, thus positively influencing structure elucidation efforts. Thirty-two features were tentatively identified at confidence level 3 or higher using MetFrag coupled with the new PubChemLite database, which included predicted collision cross-section values from CCSbase. The implementation of predicted mobility data was found to support compound annotation. This study illustrates a quantitative assessment of the benefits of IM-HRMS on spectral quality, which will enhance the performance of future screening studies of CECs in complex environmental matrices. [less ▲] Detailed reference viewed: 25 (0 UL)![]() Schymanski, Emma ![]() in ACS Environmental Au (2022), 2(4), 287--289 As the first half of 2022 comes to a close, it is an interesting time to reflect on some recent trends. In many ways, the world is “opening” up again, with many colleagues going to their first “in person” ... [more ▼] As the first half of 2022 comes to a close, it is an interesting time to reflect on some recent trends. In many ways, the world is “opening” up again, with many colleagues going to their first “in person” conferences since the start of the pandemic in early 2020. A significant leap forward for open chemistry was made in 2021, with the Chemical Abstracts Service (CAS) Registry embracing a hybrid model and releasing half a million chemicals as the CAS Common Chemistry set under an open license. (1)ACS Environmental Au continues to develop as one of the key gold open access journals for publishing work on environmental topics. (2) The European Union has just launched the €400 million European Partnership for the Assessment of Risks from Chemicals (PARC), with ∼200 partners (3) and a whole work package on FAIR (Findable, Accessible, Interoperable, Reusable) (4,5) and Open (6) data. While these trends are cause for optimism, the CAS Registry continues to climb toward the 200 million chemical mark (7) and many of us were blown away by the sheer immensity of the chemical pollution problem at recent meetings. Other colleagues, e.g., those affected by war, by lockdowns, or with insufficient funds, are unable to share in the “post-pandemic” reopening, conferences, and travel. Others cannot afford the costs associated with open access or still do not see the benefits of open science. Why the focus on these disjoint subjects? Both chemical pollution and the COVID-19 pandemic are global challenges requiring global solutions, where failure to act comes with a high price. Landrigan et al. estimated that 9 million premature deaths (16% of the global total) were caused by pollution in 2015. (8) Worldwide deaths directly due to the COVID-19 pandemic are already over 6 million (9) (January 2020 to May 2022). While public awareness is high, individuals often feel powerless to tackle global challenges─yet the pandemic has proven that individual actions can make an incredible collective difference. The same applies to open data and the exchange of research results─the collective benefit from many individual contributions can be extraordinary. [less ▲] Detailed reference viewed: 28 (0 UL)![]() Mohammed Taha, Hiba ![]() in Environmental Sciences Europe (2022), 34(1), 104 Abstract Background The NORMAN Association ( https://www.norman-network.com/ ) initiated the NORMAN Suspect List Exchange (NORMAN-SLE https://www.norman-network.com/nds/SLE/ ) in 2015, following the ... [more ▼] Abstract Background The NORMAN Association ( https://www.norman-network.com/ ) initiated the NORMAN Suspect List Exchange (NORMAN-SLE https://www.norman-network.com/nds/SLE/ ) in 2015, following the NORMAN collaborative trial on non-target screening of environmental water samples by mass spectrometry. Since then, this exchange of information on chemicals that are expected to occur in the environment, along with the accompanying expert knowledge and references, has become a valuable knowledge base for “suspect screening” lists. The NORMAN-SLE now serves as a FAIR (Findable, Accessible, Interoperable, Reusable) chemical information resource worldwide. Results The NORMAN-SLE contains 99 separate suspect list collections (as of May 2022) from over 70 contributors around the world, totalling over 100,000 unique substances. The substance classes include per- and polyfluoroalkyl substances (PFAS), pharmaceuticals, pesticides, natural toxins, high production volume substances covered under the European REACH regulation (EC: 1272/2008), priority contaminants of emerging concern (CECs) and regulatory lists from NORMAN partners. Several lists focus on transformation products (TPs) and complex features detected in the environment with various levels of provenance and structural information. Each list is available for separate download. The merged, curated collection is also available as the NORMAN Substance Database (NORMAN SusDat). Both the NORMAN-SLE and NORMAN SusDat are integrated within the NORMAN Database System (NDS). The individual NORMAN-SLE lists receive digital object identifiers (DOIs) and traceable versioning via a Zenodo community ( https://zenodo.org/communities/norman-sle ), with a total of \textgreater 40,000 unique views, \textgreater 50,000 unique downloads and 40 citations (May 2022). NORMAN-SLE content is progressively integrated into large open chemical databases such as PubChem ( https://pubchem.ncbi.nlm.nih.gov/ ) and the US EPA’s CompTox Chemicals Dashboard ( https://comptox.epa.gov/dashboard/ ), enabling further access to these lists, along with the additional functionality and calculated properties these resources offer. PubChem has also integrated significant annotation content from the NORMAN-SLE, including a classification browser ( 101 ). Conclusions The NORMAN-SLE offers a specialized service for hosting suspect screening lists of relevance for the environmental community in an open, FAIR manner that allows integration with other major chemical resources. These efforts foster the exchange of information between scientists and regulators, supporting the paradigm shift to the “one substance, one assessment” approach. New submissions are welcome via the contacts provided on the NORMAN-SLE website ( https://www.norman-network.com/nds/SLE/ ). [less ▲] Detailed reference viewed: 23 (1 UL) |
||