Unpublished conference/Abstract (Scientific congresses, symposiums and conference proceedings)
Integrating UVCBs and Related Data into Open Chemical Knowledgebases
SCHYMANSKI, Emma; ELAPAVALORE, Anjana; Li, Qingliang et al.
2023SETAC Europe 33rd Annual Meeting
 

Files


Full Text
SETAC_EU_UVCBs_May2023.pdf
Author postprint (8.08 MB) Creative Commons License - Attribution
Download

All documents in ORBilu are protected by a user license.

Send to



Details



Keywords :
UVCBs
Abstract :
[en] Platform Presentation at SETAC Europe 2023, Dublin, 30 April - 4 May 2023 Presenting in {\textless}em{\textgreater}Session 4.05 Characterization, Testing and Assessment of Complex Substances (MCS, UVCBs \& MOCS){\textless}/em{\textgreater} {\textless}br{\textgreater} Presentation 4.05.T-05 at 14:40 Wednesday 3 May (Level 3 East Wing) {\textless}strong{\textgreater}Integrating UVCBs and Related Data into Open Chemical Knowledgebases{\textless}/strong{\textgreater} Emma L. Schymanski$^{\textrm{1}}$, Anjana Elapavalore$^{\textrm{1}}$, Qingliang Li$^{\textrm{2}}$, Paul A. Thiessen$^{\textrm{2}}$, Leonid Zaslavsky$^{\textrm{2}}$, Jian Zhang$^{\textrm{2}}$, Evan E. Bolton$^{\textrm{2}}$ $^{\textrm{1}}$Luxembourg Centre for Systems Biomedicine (LCSB), University of Luxembourg, 6 avenue du Swing, 4367 Belvaux, Luxembourg. $^{\textrm{2}}$National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA Although 20-40 \% of chemical registries consist of Substances of Unknown or Variable Composition, Complex Reaction Products, and Biological Materials (UVCBs), integrating and exchanging information on UVCBs in open chemical knowledgebases is challenging. The integration of UVCBs into high resolution mass spectrometry (HR-MS) based identification workflows is also problematic. Often only a name or numerical identifier is provided in the registry listings, hindering comparison, merging and enumeration or mapping of potential component species either based on expert knowledge or (semi-) automated cheminformatics methods. Improved UVCB handling in major open chemical resources will help support the exchange of information between registries, researchers, and regulators, as well as supporting, {\textless}em{\textgreater}e.g.,{\textless}/em{\textgreater} toxicological/environmental assessments and the integration of UVCBs into HR-MS-based workflows. PubChem (https://pubchem.ncbi.nlm.nih.gov/), a large open chemical database with over 112M compounds, 298M substances and contributions from over 884 data sources, have recently introduced “concepts” to specifically improve their handling of UVCB-like entities. An initial dataset of {\textasciitilde}62K “concepts” was compiled from three large authoritative data sources with a high proportion of UVCBs (FDA GSRS, TSCA and ECHA). Close to 0.5M synonyms (names) were associated with these concepts, which were then used to form the basis for literature mining dictionaries and sets of regular expressions for pattern-based recognition of UVCBs among synonyms. This was validated over several collections. Since UVCBs of variable composition often form (or are expressed as) homologue series, this subset of UVCBs is particularly conducive to automated grouping methods and adaptation to HR-MS workflows. Thus, as a second step, the homologue grouping algorithm OngLai was run over the PubChemLite for Exposomics database (a subset of PubChem with environmentally and toxicologically relevant annotation) and connected to “concepts” using mappings to representative or component structures provided by depositors. Over 163 connections between chemical, homologue series and PubChem Concepts (often many concepts per series) were made; a select few were hand curated and processed so far as proof-of-concept exemplars. This contribution intends to show and discuss potential (and pitfalls) associated with UVCB handling in open resources to support environmental and toxicological use cases.
Disciplines :
Chemistry
Author, co-author :
SCHYMANSKI, Emma  ;  University of Luxembourg > Luxembourg Centre for Systems Biomedicine (LCSB) > Environmental Cheminformatics
ELAPAVALORE, Anjana ;  University of Luxembourg > Luxembourg Centre for Systems Biomedicine (LCSB) > Environmental Cheminformatics
Li, Qingliang
Thiessen, Paul
Zaslavsky, Leonid
Zhang, Jian
Bolton, Evan
External co-authors :
yes
Language :
English
Title :
Integrating UVCBs and Related Data into Open Chemical Knowledgebases
Publication date :
2023
Event name :
SETAC Europe 33rd Annual Meeting
Event organizer :
SETAC
Event place :
Dublin, Ireland
Event date :
30 April - 4 May 2023
Audience :
International
FnR Project :
FNR12341006 - Environmental Cheminformatics To Identify Unknown Chemicals And Their Effects, 2018 (01/10/2018-30/09/2023) - Emma Schymanski
Commentary :
Publisher: Zenodo
Available on ORBilu :
since 27 November 2023

Statistics


Number of views
82 (0 by Unilu)
Number of downloads
62 (0 by Unilu)

Bibliography


Similar publications



Contact ORBilu