![]() Lai, Adelene ![]() Doctoral thesis (2022) In most societies, using chemical products has become a part of daily life. Worldwide, over 350,000 chemicals have been registered for use in e.g., daily household consumption, industrial processes ... [more ▼] In most societies, using chemical products has become a part of daily life. Worldwide, over 350,000 chemicals have been registered for use in e.g., daily household consumption, industrial processes, agriculture, etc. However, despite the benefits chemicals may bring to society, their usage, production, and disposal, which leads to their eventual release into the environment has multiple implications. Anthropogenic chemicals have been detected in myriad ecosystems all over the planet, as well as in the tissues of wildlife and humans. The potential consequences of such chemical pollution are not fully understood, but links to the onset of human disease and threats to biodiversity have been attributed to the presence of chemicals in our environment. Mitigating the potential negative effects of chemicals typically involves regulatory steps and multiple stakeholders. One key aspect thereof is environmental monitoring, which consists of environmental sampling, measurement, data analysis, and reporting. In recent years, advancements in Liquid Chromatography-High Resolution Mass Spectrometry (LC-HRMS), open chemical databases, and software have enabled researchers to identify known (e.g., pesticides) as well as unknown environmental chemicals, commonly referred to as suspect or non-target compounds. However, identifying unknown chemicals, particularly non-targets, remains extremely challenging because of the lack of a priori knowledge on the analytes - all that is available are their mass spectrometry signals. In fact, the number of unknown features in a typical mass spectrum of an environmental sample is in the range of thousands to tens of thousands, and therefore requires feature prioritisation before identification within a suitable workflow. In this dissertation work, collaborations with two regulatory authorities responsible for environmental monitoring sought to identify relevant unknown compounds in the environment, specifically by developing computational workflows for unknown identification in LC-HRMS data. The first collaboration culminated in Publication A, which involved a joint project with the Zürcher Amt für Wasser, Energie und Luft. Environmental samples taken from wastewater treatment plant sites in Switzerland were retrospectively analysed using a pre-screening workflow that prioritised features suitable for non-target identification. For this purpose, a multi-step Quality Control algorithm that checks the quality of mass spectral data in terms of peak intensities, alignment, and signal-to-noise ratio was developed and used within pre-screening. This algorithm was incorporated into the R package Shinyscreen. Features that were prioritised by pre-screening then underwent identification using the in silico fragmentation tool MetFrag. To obtain these identifications, MetFrag was coupled to various open chemical information resources such as spectral databases like MassBank Europe and MassBank of North America, as well as suspect lists from the NORMAN Suspect List Exchange and the CompTox Chemicals Dashboard database. One confirmed and twenty-one tentative compound identifications were achieved and reported according to an established confidence level scheme. Comprehensive data interpretation and detailed communication of MetFrag’s results was performed as a means of formulating evidence-based recommendations that may inform future environmental monitoring campaigns. Building on the pre-screening and identification workflow developed in Publication A, Publication B resulted from a collaboration with the Luxembourgish Administration de la gestion de l’eau that sought to identify, and where possible quantify unknown chemicals in Luxembourgish surface waters. More specifically, surface water samples collected as part of a two-year national monitoring campaign were measured using LC-HRMS and screened for pharmaceutical parent compounds and their transformation products. Compared to pharmaceutical compound information, which is publicly available from local authorities (and was used in the suspect list), information on transformation products is relatively scarce. Therefore, new approaches were developed in this work to mine data from the PubChem database as well as from the literature in order to formulate a suspect list containing pharmaceutical transformation products, in addition to their parent compounds. Overall, 94 pharmaceuticals and 14 transformation products were identified, of which 88 and 2 were confirmed identifications respectively. The spatio-temporal occurrence and distribution of these compounds throughout the Luxembourgish environment were analysed using advanced data visualisations that highlighted patterns in certain regions and time periods of high incidence. These findings may support future chemicals management measures, particularly in environmental monitoring. Another challenging aspect of managing chemicals is that they mostly exist as complex mixtures within the environment as well as chemical products. Substances of Unknown or Variable composition, Complex reaction products or Biological materials (UVCBs) make up 20-40% of international chemical registries and include chlorinated paraffins, polymer mixtures, petroleum fractions, and essential oils. However, little is known about their chemical identities and/or compositions, which poses formidable obstacles to assessing their environmental fate and toxicity, let alone identification in the environment. Publication C addresses the challenges of UVCBs by taking an interdisciplinary approach in reviewing the literature that incorporates considerations of their chemical representations, toxicity, environmental fate, exposure, and regulatory approaches. Improved substance registration requirements, grouping techniques to simplify assessment, and the use of Mixture InChI to represent UVCBs in a findable, accessible, interoperable, and reusable (FAIR) way in databases are amongst the key recommendations of this work. A specific type of UVCB, mixtures of homologous compounds, are commonly detected in environmental samples, including many High Production Volume (HPV) compounds such as surfactants. Compounds forming homologous series are related by a common core fragment and repeating chemical subunit, and can be represented using general formulae (e.g., CnF2n+1COOH) and/or Markush structures. However, a significant identification bottleneck is the inability to match their characteristic analytical signals in LC-HRMS data with chemicals in databases; while comb-like elution patterns and constant differences in mass-to-charge ratio indicate the presence of homologous series in samples, most chemical databases do not contain annotated homologous series. To address this gap, Publication D introduces a cheminformatics algorithm, OngLai, to detect homologous series within compound datasets. OngLai, openly implemented in Python using the RDKit, detects homologous series based on two inputs: a list of compounds and the chemical structure of a repeating unit. OngLai was applied to three open datasets from environmental chemistry, exposomics, and natural products, in which thousands of homologous series with a CH2 repeating unit were detected. Classification of homologous series in compound datasets is expected to advance their analytical detection in samples. Overall, the work in this dissertation contributed to the advancement of identifying and managing unknown chemicals in the environment using cheminformatics and computational approaches. All work conducted followed Open Science and FAIR data principles: all code, datasets, analyses, and results generated, including the final peer-reviewed publications, are openly available to the public. These efforts are intended to spur further developments in unknown chemical identification and management towards protecting the environment and human health. [less ▲] Detailed reference viewed: 77 (5 UL)![]() Lai, Adelene ![]() in Journal of Cheminformatics (2022), 14(85), Homologous series are groups of related compounds that share the same core structure attached to a motif that repeats to different degrees. Compounds forming homologous series are of interest in multiple ... [more ▼] Homologous series are groups of related compounds that share the same core structure attached to a motif that repeats to different degrees. Compounds forming homologous series are of interest in multiple domains, including natural products, environmental chemistry, and drug design. However, many homologous compounds remain unannotated as such in compound datasets, which poses obstacles to understanding chemical diversity and their analytical identification via database matching. To overcome these challenges, an algorithm to detect homologous series within compound datasets was developed and implemented using the RDKit. The algorithm takes a list of molecules as SMILES strings and a monomer (i.e., repeating unit) encoded as SMARTS as its main inputs. In an iterative process, substructure matching of repeating units, molecule fragmentation, and core detection lead to homologous series classification through grouping of identical cores. Three open compound datasets from environmental chemistry (NORMAN Suspect List Exchange, NORMAN-SLE), exposomics (PubChemLite for Exposomics), and natural products (the COlleCtion of Open NatUral producTs, COCONUT) were subject to homologous series classification using the algorithm. Over 2000, 12,000, and 5000 series with CH2 repeating units were classified in the NORMAN-SLE, PubChemLite, and COCONUT respectively. Validation of classified series was performed using published homologous series and structure categories, including a comparison with a similar existing method for categorising PFAS compounds. The OngLai algorithm and its implementation for classifying homologues are openly available at: https://github.com/adelenelai/onglai-classify-homologues. [less ▲] Detailed reference viewed: 21 (1 UL)![]() Lai, Adelene ![]() in Environmental Science and Technology (2022) Substances of unknown or variable composition, complex reaction products, or biological materials (UVCBs) are over 70 000 “complex” chemical mixtures produced and used at significant levels worldwide. Due ... [more ▼] Substances of unknown or variable composition, complex reaction products, or biological materials (UVCBs) are over 70 000 “complex” chemical mixtures produced and used at significant levels worldwide. Due to their unknown or variable composition, applying chemical assessments originally developed for individual compounds to UVCBs is challenging, which impedes sound management of these substances. Across the analytical sciences, toxicology, cheminformatics, and regulatory practice, new approaches addressing specific aspects of UVCB assessment are being developed, albeit in a fragmented manner. This review attempts to convey the “big picture” of the state of the art in dealing with UVCBs by holistically examining UVCB characterization and chemical identity representation, as well as hazard, exposure, and risk assessment. Overall, information gaps on chemical identities underpin the fundamental challenges concerning UVCBs, and better reporting and substance characterization efforts are needed to support subsequent chemical assessments. To this end, an information level scheme for improved UVCB data collection and management within databases is proposed. The development of UVCB testing shows early progress, in line with three main methods: whole substance, known constituents, and fraction profiling. For toxicity assessment, one option is a whole-mixture testing approach. If the identities of (many) constituents are known, grouping, read across, and mixture toxicity modeling represent complementary approaches to overcome data gaps in toxicity assessment. This review highlights continued needs for concerted efforts from all stakeholders to ensure proper assessment and sound management of UVCBs. [less ▲] Detailed reference viewed: 57 (3 UL)![]() ; ; Kondic, Todor ![]() in Environment International (2022), 158 The diversity of hundreds of thousands of potential organic pollutants and the lack of (publicly available) information about many of them is a huge challenge for environmental sciences, engineering, and ... [more ▼] The diversity of hundreds of thousands of potential organic pollutants and the lack of (publicly available) information about many of them is a huge challenge for environmental sciences, engineering, and regulation. Suspect screening based on high-resolution liquid chromatography-mass spectrometry (LC-HRMS) has enormous potential to help characterize the presence of these chemicals in our environment, enabling the detection of known and newly emerging pollutants, as well as their potential transformation products (TPs). Here, suspect list creation (focusing on pesticides relevant for Luxembourg, incorporating data sources in 4 languages) was coupled to an automated retrieval of related TPs from PubChem based on high confidence suspect hits, to screen for pesticides and their TPs in Luxembourgish river samples. A computational workflow was established to combine LC-HRMS analysis and pre-screening of the suspects (including automated quality control steps), with spectral annotation to determine which pesticides and, in a second step, their related TPs may be present in the samples. The data analysis with Shinyscreen (https://gitlab.lcsb.uni.lu/eci/shinyscreen/), an open source software developed in house, coupled with custom-made scripts, revealed the presence of 162 potential pesticide masses and 96 potential TP masses in the samples. Further identification of these mass matches was performed using the open source approach MetFrag (https://msbi.ipb-halle.de/MetFrag/). Eventual target analysis of 36 suspects resulted in 31 pesticides and TPs confirmed at Level-1 (highest confidence), and five pesticides and TPs not confirmed due to different retention times. Spatio-temporal analysis of the results showed that TPs and pesticides followed similar trends, with a maximum number of potential detections in July. The highest detections were in the rivers Alzette and Mess and the lowest in the Sûre and Eisch. This study (a) added pesticides, classification information and related TPs into the open domain, (b) developed automated open source retrieval methods - both enhancing FAIRness (Findability, Accessibility, Interoperability and Reusability) of the data and methods; and (c) will directly support “L’Administration de la Gestion de l’Eau” on further monitoring steps in Luxembourg. [less ▲] Detailed reference viewed: 92 (10 UL)![]() ; ; Kondic, Todor ![]() Scientific Conference (2021, June 24) Detailed reference viewed: 53 (2 UL)![]() Lai, Adelene ![]() ![]() in Environmental Sciences Europe (2021), 33(1), 43 Abstract Background Applying non-target analysis (NTA) in regulatory environmental monitoring remains challenging—instead of having exploratory questions, regulators usually already have specific ... [more ▼] Abstract Background Applying non-target analysis (NTA) in regulatory environmental monitoring remains challenging—instead of having exploratory questions, regulators usually already have specific questions related to environmental protection aims. Additionally, data analysis can seem overwhelming because of the large data volumes and many steps required. This work aimed to establish an open in silico workflow to identify environmental chemical unknowns via retrospective NTA within the scope of a pre-existing Swiss environmental monitoring campaign focusing on industrial chemicals. The research question addressed immediate regulatory priorities: identify pollutants with industrial point sources occurring at the highest intensities over two time points. Samples from 22 wastewater treatment plants obtained in 2018 and measured using liquid chromatography–high resolution mass spectrometry were retrospectively analysed by (i) performing peak-picking to identify masses of interest; (ii) prescreening and quality-controlling spectra, and (iii) tentatively identifying priority “known unknown” pollutants by leveraging environmentally relevant chemical information provided by Swiss, Swedish, EU-wide, and American regulators. This regulator-supplied information was incorporated into MetFrag, an in silico identification tool replete with “post-relaunch” features used here. This study’s unique regulatory context posed challenges in data quality and volume that were directly addressed with the prescreening, quality control, and identification workflow developed. Results One confirmed and 21 tentative identifications were achieved, suggesting the presence of compounds as diverse as manufacturing reagents, adhesives, pesticides, and pharmaceuticals in the samples. More importantly, an in-depth interpretation of the results in the context of environmental regulation and actionable next steps are discussed. The prescreening and quality control workflow is openly accessible within the R package Shinyscreen, and adaptable to any (retrospective) analysis requiring automated quality control of mass spectra and non-target identification, with potential applications in environmental and metabolomics analyses. Conclusions NTA in regulatory monitoring is critical for environmental protection, but bottlenecks in data analysis and results interpretation remain. The prescreening and quality control workflow, and interpretation work performed here are crucial steps towards scaling up NTA for environmental monitoring. [less ▲] Detailed reference viewed: 68 (2 UL)![]() Singh, Randolph ![]() ![]() ![]() in ACS Environmental Au (2021) Pharmaceuticals and their transformation products (TPs) are continuously released into the aquatic environment via anthropogenic activity. To expand knowledge on the presence of pharmaceuticals and their ... [more ▼] Pharmaceuticals and their transformation products (TPs) are continuously released into the aquatic environment via anthropogenic activity. To expand knowledge on the presence of pharmaceuticals and their known TPs in Luxembourgish rivers, 92 samples collected during routine monitoring events between 2019 and 2020 were investigated using nontarget analysis. Water samples were concentrated using solid-phase extraction and then analyzed using liquid chromatography coupled to a high-resolution mass spectrometer. Suspect screening was performed using several open source computational tools and resources including Shinyscreen (https://git-r3lab.uni.lu/eci/shinyscreen/), MetFrag (https://msbi.ipb-halle.de/MetFrag/), PubChemLite (https://zenodo.org/record/4432124), and MassBank (https://massbank.eu/MassBank/). A total of 94 pharmaceuticals, 88 confirmed at a level 1 confidence (86 of which could be quantified, two compounds too low to be quantified) and six identified at level 2a, were found to be present in Luxembourg rivers. Pharmaceutical TPs (12) were also found at a level 2a confidence. The pharmaceuticals were present at median concentrations up to 214 ng/L, with caffeine having a median concentration of 1424 ng/L. Antihypertensive drugs (15), psychoactive drugs (15), and antimicrobials (eight) were the most detected groups of pharmaceuticals. A spatiotemporal analysis of the data revealed areas with higher concentrations of the pharmaceuticals, as well as differences in pharmaceutical concentrations between 2019 and 2020. The results of this work will help guide activities for improving water management in the country and set baseline data for continuous monitoring and screening efforts, as well as for further open data and software developments. [less ▲] Detailed reference viewed: 77 (3 UL)![]() Krier, Jessy ![]() ![]() ![]() E-print/Working paper (2021) Abstract The diversity of hundreds of thousands of potential organic pollutants and the lack of (publicly available) information about many of them is a huge challenge for environmental sciences ... [more ▼] Abstract The diversity of hundreds of thousands of potential organic pollutants and the lack of (publicly available) information about many of them is a huge challenge for environmental sciences, engineering, and regulation. Suspect screening based on high-resolution liquid chromatography-mass spectrometry (LC-HRMS) has enormous potential to help characterize the presence of these chemicals in our environment, enabling the detection of known and newly emerging pollutants, as well as their potential transformation products (TPs). Here, suspect list creation (focusing on pesticides relevant for Luxembourg, incorporating data sources in 4 languages) was coupled to an automated retrieval of related TPs from PubChem based on high confidence suspect hits, to screen for pesticides and their TPs in Luxembourgish river samples. A computational workflow was established to combine LC-HRMS analysis and pre-screening of the suspects (including automated quality control steps), with spectral annotation to determine which pesticides and, in a second step, their related TPs may be present in the samples. The data analysis with Shinyscreen (https://git-r3lab.uni.lu/eci/shinyscreen/), an open source software developed in house, coupled with custom-made scripts, revealed the presence of 162 potential pesticide masses and 135 potential TP masses in the samples. Further identification of these mass matches was performed using the open source MetFrag (https://msbi.ipb-halle.de/MetFrag/). Eventual target analysis of 36 suspects resulted in 31 pesticides and TPs confirmed at Level-1 (highest confidence), and five pesticides and TPs not confirmed due to different retention times. Spatio-temporal analysis of the results showed that TPs and pesticides followed similar trends, with a maximum number of potential detections in July. The highest detections were in the rivers Alzette and Mess and the lowest in the Sûre and Eisch. This study (a) added pesticides, classification information and related TPs into the open domain, (b) developed automated open source retrieval methods - both enhancing FAIRness (Findability, Accessibility, Interoperability and Reusability) of the data and methods; and (c) will directly support “L’Administration de la Gestion de l’Eau” on further monitoring steps in Luxembourg. [less ▲] Detailed reference viewed: 73 (3 UL)![]() Singh, Randolph ![]() ![]() ![]() E-print/Working paper (2021) This pre-print describes the analysis of pharmaceuticals and their transformation products in surface water samples collected in Luxembourg from 2019 to 2020. Details of the experimental and computational ... [more ▼] This pre-print describes the analysis of pharmaceuticals and their transformation products in surface water samples collected in Luxembourg from 2019 to 2020. Details of the experimental and computational tools and workflows used are fully described in the manuscript. Links to the suspect lists, codes used, and data files are also provided. [less ▲] Detailed reference viewed: 43 (0 UL)![]() Lai, Adelene ![]() ![]() in Environmental Sciences Europe (2020) Detailed reference viewed: 111 (13 UL) |
||