Results 1-14 of 14.
((uid:50034913))
![]() ; ; et al in Digital Discovery (2022) Extracting PFAS with open source cheminformatics toolkits reveals ~1.78 million PFAS in Google Patents, ~28 K in the CORE literature repository. The extraction of chemical information from documents is a ... [more ▼] Extracting PFAS with open source cheminformatics toolkits reveals ~1.78 million PFAS in Google Patents, ~28 K in the CORE literature repository. The extraction of chemical information from documents is a demanding task in cheminformatics due to the variety of text and image-based representations of chemistry. The present work describes the extraction of chemical compounds with unique chemical structures from the open access CORE (COnnecting REpositories) and Google Patents full text document repositories. The importance of structure normalization is demonstrated using three open access cheminformatics toolkits: the Chemistry Development Kit (CDK), RDKit and OpenChemLib (OCL). Each toolkit was used for structure parsing, normalization and subsequent substructure searching, using SMILES as structure representations of chemical molecules and International Chemical Identifiers (InChIs) for comparison. Per- and polyfluoroalkyl substances (PFAS) were chosen as a case study to perform the substructure search, due to their high environmental relevance, their presence in both literature and patent corpuses, and the current lack of community consensus on their definition. Three different structural definitions of PFAS were chosen to highlight the implications of various definitions from a cheminformatics perspective. Since CDK, RDKit and OCL implement different criteria and methods for SMILES parsing and normalization, different numbers of parsed compounds were extracted, which were then evaluated using the three PFAS definitions. A comparison of these toolkits and definitions is provided, along with a discussion of the implications for PFAS screening and text mining efforts in cheminformatics. Finally, the extracted PFAS (~1.7 M PFAS from patents and ~27 K from CORE) were compared against various existing PFAS lists and are provided in various formats for further community research efforts. [less ▲] Detailed reference viewed: 51 (3 UL)![]() ; ; Kondic, Todor ![]() in Environment International (2022), 158 The diversity of hundreds of thousands of potential organic pollutants and the lack of (publicly available) information about many of them is a huge challenge for environmental sciences, engineering, and ... [more ▼] The diversity of hundreds of thousands of potential organic pollutants and the lack of (publicly available) information about many of them is a huge challenge for environmental sciences, engineering, and regulation. Suspect screening based on high-resolution liquid chromatography-mass spectrometry (LC-HRMS) has enormous potential to help characterize the presence of these chemicals in our environment, enabling the detection of known and newly emerging pollutants, as well as their potential transformation products (TPs). Here, suspect list creation (focusing on pesticides relevant for Luxembourg, incorporating data sources in 4 languages) was coupled to an automated retrieval of related TPs from PubChem based on high confidence suspect hits, to screen for pesticides and their TPs in Luxembourgish river samples. A computational workflow was established to combine LC-HRMS analysis and pre-screening of the suspects (including automated quality control steps), with spectral annotation to determine which pesticides and, in a second step, their related TPs may be present in the samples. The data analysis with Shinyscreen (https://gitlab.lcsb.uni.lu/eci/shinyscreen/), an open source software developed in house, coupled with custom-made scripts, revealed the presence of 162 potential pesticide masses and 96 potential TP masses in the samples. Further identification of these mass matches was performed using the open source approach MetFrag (https://msbi.ipb-halle.de/MetFrag/). Eventual target analysis of 36 suspects resulted in 31 pesticides and TPs confirmed at Level-1 (highest confidence), and five pesticides and TPs not confirmed due to different retention times. Spatio-temporal analysis of the results showed that TPs and pesticides followed similar trends, with a maximum number of potential detections in July. The highest detections were in the rivers Alzette and Mess and the lowest in the Sûre and Eisch. This study (a) added pesticides, classification information and related TPs into the open domain, (b) developed automated open source retrieval methods - both enhancing FAIRness (Findability, Accessibility, Interoperability and Reusability) of the data and methods; and (c) will directly support “L’Administration de la Gestion de l’Eau” on further monitoring steps in Luxembourg. [less ▲] Detailed reference viewed: 102 (10 UL)![]() ; ; et al Report (2022) Abstract Despite the increasing availability of tandem mass spectrometry (MS/MS) community spectral libraries for untargeted metabolomics over the past decade, the majority of acquired MS/MS spectra ... [more ▼] Abstract Despite the increasing availability of tandem mass spectrometry (MS/MS) community spectral libraries for untargeted metabolomics over the past decade, the majority of acquired MS/MS spectra remain uninterpreted. To further aid in interpreting unannotated spectra, we created a nearest neighbor suspect spectral library, consisting of 87,916 annotated MS/MS spectra derived from hundreds of millions of public MS/MS spectra. Annotations were propagated based on structural relationships to reference molecules using MS/MS-based spectrum alignment. We demonstrate the broad relevance of the nearest neighbor suspect spectral library through representative examples of propagation-based annotation of acylcarnitines, bacterial and plant natural products, and drug metabolism. Our results also highlight how the library can help to better understand an Alzheimer’s brain phenotype. The nearest neighbor suspect spectral library is openly available through the GNPS platform to help investigators hypothesize candidate structures for unknown MS/MS spectra in untargeted metabolomics data. [less ▲] Detailed reference viewed: 37 (1 UL)![]() ; ; Kondic, Todor ![]() Scientific Conference (2021, June 24) Detailed reference viewed: 56 (3 UL)![]() Schymanski, Emma ![]() ![]() in Journal of Cheminformatics (2021), 13(1), 19 Abstract Compound (or chemical) databases are an invaluable resource for many scientific disciplines. Exposomics researchers need to find and identify relevant chemicals that cover the entirety of ... [more ▼] Abstract Compound (or chemical) databases are an invaluable resource for many scientific disciplines. Exposomics researchers need to find and identify relevant chemicals that cover the entirety of potential (chemical and other) exposures over entire lifetimes. This daunting task, with over 100 million chemicals in the largest chemical databases, coupled with broadly acknowledged knowledge gaps in these resources, leaves researchers faced with too much—yet not enough—information at the same time to perform comprehensive exposomics research. Furthermore, the improvements in analytical technologies and computational mass spectrometry workflows coupled with the rapid growth in databases and increasing demand for high throughput “big data” services from the research community present significant challenges for both data hosts and workflow developers. This article explores how to reduce candidate search spaces in non-target small molecule identification workflows, while increasing content usability in the context of environmental and exposomics analyses, so as to profit from the increasing size and information content of large compound databases, while increasing efficiency at the same time. In this article, these methods are explored using PubChem, the NORMAN Network Suspect List Exchange and the in silico fragmentation approach MetFrag. A subset of the PubChem database relevant for exposomics, PubChemLite, is presented as a database resource that can be (and has been) integrated into current workflows for high resolution mass spectrometry. Benchmarking datasets from earlier publications are used to show how experimental knowledge and existing datasets can be used to detect and fill gaps in compound databases to progressively improve large resources such as PubChem, and topic-specific subsets such as PubChemLite. PubChemLite is a living collection, updating as annotation content in PubChem is updated, and exported to allow direct integration into existing workflows such as MetFrag. The source code and files necessary to recreate or adjust this are jointly hosted between the research parties (see data availability statement). This effort shows that enhancing the FAIRness (Findability, Accessibility, Interoperability and Reusability) of open resources can mutually enhance several resources for whole community benefit. The authors explicitly welcome additional community input on ideas for future developments. [less ▲] Detailed reference viewed: 73 (2 UL)![]() Singh, Randolph ![]() ![]() ![]() E-print/Working paper (2021) This pre-print describes the analysis of pharmaceuticals and their transformation products in surface water samples collected in Luxembourg from 2019 to 2020. Details of the experimental and computational ... [more ▼] This pre-print describes the analysis of pharmaceuticals and their transformation products in surface water samples collected in Luxembourg from 2019 to 2020. Details of the experimental and computational tools and workflows used are fully described in the manuscript. Links to the suspect lists, codes used, and data files are also provided. [less ▲] Detailed reference viewed: 45 (0 UL)![]() Singh, Randolph ![]() ![]() ![]() in ACS Environmental Au (2021) Pharmaceuticals and their transformation products (TPs) are continuously released into the aquatic environment via anthropogenic activity. To expand knowledge on the presence of pharmaceuticals and their ... [more ▼] Pharmaceuticals and their transformation products (TPs) are continuously released into the aquatic environment via anthropogenic activity. To expand knowledge on the presence of pharmaceuticals and their known TPs in Luxembourgish rivers, 92 samples collected during routine monitoring events between 2019 and 2020 were investigated using nontarget analysis. Water samples were concentrated using solid-phase extraction and then analyzed using liquid chromatography coupled to a high-resolution mass spectrometer. Suspect screening was performed using several open source computational tools and resources including Shinyscreen (https://git-r3lab.uni.lu/eci/shinyscreen/), MetFrag (https://msbi.ipb-halle.de/MetFrag/), PubChemLite (https://zenodo.org/record/4432124), and MassBank (https://massbank.eu/MassBank/). A total of 94 pharmaceuticals, 88 confirmed at a level 1 confidence (86 of which could be quantified, two compounds too low to be quantified) and six identified at level 2a, were found to be present in Luxembourg rivers. Pharmaceutical TPs (12) were also found at a level 2a confidence. The pharmaceuticals were present at median concentrations up to 214 ng/L, with caffeine having a median concentration of 1424 ng/L. Antihypertensive drugs (15), psychoactive drugs (15), and antimicrobials (eight) were the most detected groups of pharmaceuticals. A spatiotemporal analysis of the data revealed areas with higher concentrations of the pharmaceuticals, as well as differences in pharmaceutical concentrations between 2019 and 2020. The results of this work will help guide activities for improving water management in the country and set baseline data for continuous monitoring and screening efforts, as well as for further open data and software developments. [less ▲] Detailed reference viewed: 80 (3 UL)![]() Lai, Adelene ![]() ![]() in Environmental Sciences Europe (2021), 33(1), 43 Abstract Background Applying non-target analysis (NTA) in regulatory environmental monitoring remains challenging—instead of having exploratory questions, regulators usually already have specific ... [more ▼] Abstract Background Applying non-target analysis (NTA) in regulatory environmental monitoring remains challenging—instead of having exploratory questions, regulators usually already have specific questions related to environmental protection aims. Additionally, data analysis can seem overwhelming because of the large data volumes and many steps required. This work aimed to establish an open in silico workflow to identify environmental chemical unknowns via retrospective NTA within the scope of a pre-existing Swiss environmental monitoring campaign focusing on industrial chemicals. The research question addressed immediate regulatory priorities: identify pollutants with industrial point sources occurring at the highest intensities over two time points. Samples from 22 wastewater treatment plants obtained in 2018 and measured using liquid chromatography–high resolution mass spectrometry were retrospectively analysed by (i) performing peak-picking to identify masses of interest; (ii) prescreening and quality-controlling spectra, and (iii) tentatively identifying priority “known unknown” pollutants by leveraging environmentally relevant chemical information provided by Swiss, Swedish, EU-wide, and American regulators. This regulator-supplied information was incorporated into MetFrag, an in silico identification tool replete with “post-relaunch” features used here. This study’s unique regulatory context posed challenges in data quality and volume that were directly addressed with the prescreening, quality control, and identification workflow developed. Results One confirmed and 21 tentative identifications were achieved, suggesting the presence of compounds as diverse as manufacturing reagents, adhesives, pesticides, and pharmaceuticals in the samples. More importantly, an in-depth interpretation of the results in the context of environmental regulation and actionable next steps are discussed. The prescreening and quality control workflow is openly accessible within the R package Shinyscreen, and adaptable to any (retrospective) analysis requiring automated quality control of mass spectra and non-target identification, with potential applications in environmental and metabolomics analyses. Conclusions NTA in regulatory monitoring is critical for environmental protection, but bottlenecks in data analysis and results interpretation remain. The prescreening and quality control workflow, and interpretation work performed here are crucial steps towards scaling up NTA for environmental monitoring. [less ▲] Detailed reference viewed: 71 (2 UL)![]() Krier, Jessy ![]() ![]() ![]() E-print/Working paper (2021) Abstract The diversity of hundreds of thousands of potential organic pollutants and the lack of (publicly available) information about many of them is a huge challenge for environmental sciences ... [more ▼] Abstract The diversity of hundreds of thousands of potential organic pollutants and the lack of (publicly available) information about many of them is a huge challenge for environmental sciences, engineering, and regulation. Suspect screening based on high-resolution liquid chromatography-mass spectrometry (LC-HRMS) has enormous potential to help characterize the presence of these chemicals in our environment, enabling the detection of known and newly emerging pollutants, as well as their potential transformation products (TPs). Here, suspect list creation (focusing on pesticides relevant for Luxembourg, incorporating data sources in 4 languages) was coupled to an automated retrieval of related TPs from PubChem based on high confidence suspect hits, to screen for pesticides and their TPs in Luxembourgish river samples. A computational workflow was established to combine LC-HRMS analysis and pre-screening of the suspects (including automated quality control steps), with spectral annotation to determine which pesticides and, in a second step, their related TPs may be present in the samples. The data analysis with Shinyscreen (https://git-r3lab.uni.lu/eci/shinyscreen/), an open source software developed in house, coupled with custom-made scripts, revealed the presence of 162 potential pesticide masses and 135 potential TP masses in the samples. Further identification of these mass matches was performed using the open source MetFrag (https://msbi.ipb-halle.de/MetFrag/). Eventual target analysis of 36 suspects resulted in 31 pesticides and TPs confirmed at Level-1 (highest confidence), and five pesticides and TPs not confirmed due to different retention times. Spatio-temporal analysis of the results showed that TPs and pesticides followed similar trends, with a maximum number of potential detections in July. The highest detections were in the rivers Alzette and Mess and the lowest in the Sûre and Eisch. This study (a) added pesticides, classification information and related TPs into the open domain, (b) developed automated open source retrieval methods - both enhancing FAIRness (Findability, Accessibility, Interoperability and Reusability) of the data and methods; and (c) will directly support “L’Administration de la Gestion de l’Eau” on further monitoring steps in Luxembourg. [less ▲] Detailed reference viewed: 75 (3 UL)![]() ; ; Schymanski, Emma ![]() in F1000Research (2021), 10 Toxicology has been an active research field for many decades, with academic, industrial and government involvement. Modern omics and computational approaches are changing the field, from merely disease ... [more ▼] Toxicology has been an active research field for many decades, with academic, industrial and government involvement. Modern omics and computational approaches are changing the field, from merely disease-specific observational models into target-specific predictive models. Traditionally, toxicology has strong links with other fields such as biology, chemistry, pharmacology and medicine. With the rise of synthetic and new engineered materials, alongside ongoing prioritisation needs in chemical risk assessment for existing chemicals, early predictive evaluations are becoming of utmost importance to both scientific and regulatory purposes. ELIXIR is an intergovernmental organisation that brings together life science resources from across Europe. To coordinate the linkage of various life science efforts around modern predictive toxicology, the establishment of a new ELIXIR Community is seen as instrumental. In the past few years, joint efforts, building on incidental overlap, have been piloted in the context of ELIXIR. For example, the EU-ToxRisk, diXa, HeCaToS, transQST, and the nanotoxicology community have worked with the ELIXIR TeSS, Bioschemas, and Compute Platforms and activities. In 2018, a core group of interested parties wrote a proposal, outlining a sketch of what this new ELIXIR Toxicology Community would look like. A recent workshop (held September 30th to October 1st, 2020) extended this into an ELIXIR Toxicology roadmap and a shortlist of limited investment-high gain collaborations to give body to this new community. This Whitepaper outlines the results of these efforts and defines our vision of the ELIXIR Toxicology Community and how it complements other ELIXIR activities. [less ▲] Detailed reference viewed: 39 (3 UL)![]() Lai, Adelene ![]() ![]() in Environmental Sciences Europe (2020) Detailed reference viewed: 115 (13 UL)![]() Schymanski, Emma ![]() ![]() E-print/Working paper (2020) Detailed reference viewed: 109 (0 UL)![]() Schymanski, Emma ![]() Presentation (2020, April 10) In light of recent events, many of us have been impacted by the cancellation of conferences and meetings. We are not only losing the opportunity to present our research, but a chance to connect with our ... [more ▼] In light of recent events, many of us have been impacted by the cancellation of conferences and meetings. We are not only losing the opportunity to present our research, but a chance to connect with our community. Virtual Podium is a platform and opportunity to present and learn about compelling scientific research. Our third session will be focused on Compound Identification. Our keynote speaker this week will be Emma Schymanski who is the PI of Environmental Cheminformatics at the University of Luxembourg. Session 3: Compound Identification Friday, April 10, 2020 at 12:00-1:00PM PDT (3:00-4:00PM EDT) Session 3 - Compound Identification: https://www.eventbrite.com/e/101426613732 [less ▲] Detailed reference viewed: 81 (5 UL)![]() ; Schymanski, Emma ![]() ![]() Computer development (2020) Detailed reference viewed: 93 (3 UL) |
||