Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
All documents in ORBilu are protected by a user license.
[en] Homologous series are groups of related compounds that share the same core structure attached to a motif that repeats to different degrees. Compounds forming homologous series are of interest in multiple domains, including natural products, environmental chemistry, and drug design. However, many homologous compounds remain unannotated as such in compound datasets, which poses obstacles to understanding chemical diversity and their analytical identification via database matching. To overcome these challenges, an algorithm to detect homologous series within compound datasets was developed and implemented using the RDKit. The algorithm takes a list of molecules as SMILES strings and a monomer (i.e., repeating unit) encoded as SMARTS as its main inputs. In an iterative process, substructure matching of repeating units, molecule fragmentation, and core detection lead to homologous series classification through grouping of identical cores. Three open compound datasets from environmental chemistry (NORMAN Suspect List Exchange, NORMAN-SLE), exposomics (PubChemLite for Exposomics), and natural products (the COlleCtion of Open NatUral producTs, COCONUT) were subject to homologous series classification using the algorithm. Over 2000, 12,000, and 5000 series with CH2 repeating units were classified in the NORMAN-SLE, PubChemLite, and COCONUT respectively. Validation of classified series was performed using published homologous series and structure categories, including a comparison with a similar existing method for categorising PFAS compounds. The OngLai algorithm and its implementation for classifying homologues are openly available at: https://github.com/adelenelai/onglai-classify-homologues.
Research center :
- Luxembourg Centre for Systems Biomedicine (LCSB): Environmental Cheminformatics (Schymanski Group)
Disciplines :
Chemistry
Author, co-author :
Lai, Adelene ; University of Luxembourg > Luxembourg Centre for Systems Biomedicine (LCSB) > Environmental Cheminformatics ; Friedrich Schiller University Jena > Institute for Inorganic and Analytical Chemistry
Schaub, Jonas; Friedrich Schiller University Jena > Institute for Inorganic and Analytical Chemistry
Steinbeck, Christoph; Friedrich Schiller University Jena > Institute for Inorganic and Analytical Chemistry
Schymanski, Emma ; University of Luxembourg > Luxembourg Centre for Systems Biomedicine (LCSB) > Environmental Cheminformatics
External co-authors :
yes
Language :
English
Title :
An algorithm to classify homologous series within compound datasets
Markush EA (1924) Pyrazolone Dye and Process of Making the Same. USA101506316, August 26, 1924. https://pdfpiw.uspto.gov/.piw?PageNum=USA101506316&docid=01506316&IDKey=83E682D73B35&HomeUrl=http%3A%2F%2Fpatft.uspto.gov%2Fnetacgi%2Fnph-Parser%3FSect1%3DPTO1%2526Sect2%3DHITOFF%2526p%3D1%2526u%3D%2Fnetahtml%2FPTO%2Fsrchnum.html%2526r%3D1%2526f%3DG%2526l%3D50%2526d%3DPALL%2526s1%3D1506316.PN.%2526OS%3D%2526RS%3D. Accessed 25 Mar 2022
Lima LM, Alves MA, Amaral DN (2019) Homologation: a versatile molecular modification strategy to drug discovery. Curr Top Med Chem. 19:1734–1750. 10.2174/1568026619666190808145235 DOI: 10.2174/1568026619666190808145235
Niemczak M, Rzemieniecki T, Sobiech Ł, Skrzypczak G, Praczyk T, Pernak J (2019) Influence of the alkyl chain length on the physicochemical properties and biological activity in a homologous series of dichlorprop-based herbicidal ionic liquids. J Mol Liq 276:431–440. 10.1016/j.molliq.2018.12.013 DOI: 10.1016/j.molliq.2018.12.013
Zhu J-P, Liang M-Y, Ma Y-R, White LV, Banwell MG, Teng Y, Lan P (2022) Enzymatic synthesis of an homologous series of long- and very long-chain sucrose esters and evaluation of their emulsifying and biological properties. Food Hydrocoll 124:107149. 10.1016/j.foodhyd.2021.107149 DOI: 10.1016/j.foodhyd.2021.107149
Wolf SE, Liu T, Govind S, Zhao H, Huang G, Zhang A, Wu Y, Chin J, Cheng K, Salami-Ranjbaran E, Gao F, Gao G, Jin Y, Pu Y, Toledo TG, Ablajan K, Walsh PJ, Fakhraai Z (2021) Design of a homologous series of molecular glassformers. J Chem Phys 155(22):224503. 10.1063/5.0066410 DOI: 10.1063/5.0066410
Samarkina DA, Gabdrakhmanov DR, Lukashenko SS, Nizameev IR, Kadirov MK, Zakharova LY (2019) Homologous series of amphiphiles bearing imidazolium head group complexation with bovine serum albumin. J Mol Liq 275:232–240. 10.1016/j.molliq.2018.11.082 DOI: 10.1016/j.molliq.2018.11.082
Carballeira NM, Miranda C, Lozano CM, Nechev JT, Ivanova A, Stefanov K, Ilieva M, Tzvetkova I (2001) Characterization of novel methyl-branched chain fatty acids from a halophilic bacillus species. J Nat Prod 64(2):256–259. 10.1021/np000494d DOI: 10.1021/np000494d
Schlingmann G, Roll DM (2007) Homolog separation, a necessity for the proper identification of fungal metabolites. J Chromatogr A 1156(1):264–270. 10.1016/j.chroma.2006.11.098 DOI: 10.1016/j.chroma.2006.11.098
Rama Rao M, Faulkner DJ (2002) Isotactic Polymethoxydienes from the philippines sponge Myriastra Clavosa. J Nat Prod 65(8):1201–1203. 10.1021/np020040b DOI: 10.1021/np020040b
Ross SA, Weete JD, Schinazi RF, Wirtz SS, Tharnish P, Scheuer PJ, Hamann MT (2000) Mololipids, a new series of anti-HIV bromotyramine-derived compounds from a sponge of the order Verongida. J Nat Prod 63(4):501–503. 10.1021/np980414u DOI: 10.1021/np980414u
Rijpstra WIC, Reneerkens J, Piersma T, Damsté JSS (2007) Structural identification of the β-hydroxy fatty acid-based diester preen gland waxes of shorebirds. J Nat Prod 70(11):1804–1807. 10.1021/np070254z DOI: 10.1021/np070254z
Bloor S, Catchpole O, Mitchell K, Webby R, Davis P (2019) Antiproliferative acylated glycerols from New Zealand Propolis. J Nat Prod 82(9):2359–2367. 10.1021/acs.jnatprod.8b00562 DOI: 10.1021/acs.jnatprod.8b00562
Rodriguez-Saona CR, Maynard DF, Phillips S, Trumble JT (1999) Alkylfurans: effects of alkyl side-chain length on insecticidal activity. J Nat Prod 62(1):191–193. 10.1021/np980340m DOI: 10.1021/np980340m
Nikolopoulou V, Aalizadeh R, Nika M-C, Thomaidis NS (2022) TrendProbe: time profile analysis of emerging contaminants by LC-HRMS non-target screening and deep learning convolutional neural network. J Hazard Mater 428:128194. 10.1016/j.jhazmat.2021.128194 DOI: 10.1016/j.jhazmat.2021.128194
Schinkel L, Lara-Martín PA, Giger W, Hollender J, Berg M (2022) Synthetic surfactants in Swiss sewage sludges: analytical challenges, concentrations and per capita loads. Sci Total Environ 808:151361. 10.1016/j.scitotenv.2021.151361 DOI: 10.1016/j.scitotenv.2021.151361
Mairinger T, Loos M, Hollender J (2021) Characterization of water-soluble synthetic polymeric substances in wastewater using LC-HRMS/MS. Water Res 190:116745. 10.1016/j.watres.2020.116745 DOI: 10.1016/j.watres.2020.116745
Krauss M, Hug C, Bloch R, Schulze T, Brack W (2019) Prioritising site-specific micropollutants in surface water from LC-HRMS non-target screening data using a rarity score. Environ Sci Eur 31(1):45. 10.1186/s12302-019-0231-z DOI: 10.1186/s12302-019-0231-z
Jacob P, Barzen-Hanson KA, Helbling DE (2021) Target and nontarget analysis of per- and polyfluoralkyl substances in wastewater from electronics fabrication facilities. Environ Sci Technol 55(4):2346–2356. 10.1021/acs.est.0c06690 DOI: 10.1021/acs.est.0c06690
Dimzon IK, Trier X, Frömel T, Helmus R, Knepper TP, de Voogt P (2016) High resolution mass spectrometry of polyfluorinated polyether-based formulation. J Am Soc Mass Spectrom 27(2):309–318. 10.1007/s13361-015-1269-9 DOI: 10.1007/s13361-015-1269-9
Jia S, Marques Dos Santos M, Li C, Snyder SA (2022) Recent advances in mass spectrometry analytical techniques for per- and polyfluoroalkyl substances (PFAS). Anal Bioanal Chem. 10.1007/s00216-022-03905-y DOI: 10.1007/s00216-022-03905-y
Glüge J, Scheringer M, Cousins IT, DeWitt JC, Goldenman G, Herzke D, Lohmann R, Ng CA, Trier X, Wang Z (2020) An overview of the uses of per- and polyfluoroalkyl substances (PFAS). Environ Sci Process Impacts 22(12):2345–2373. 10.1039/D0EM00291G DOI: 10.1039/D0EM00291G
Oellig C, Hammel Y-A (2019) Screening for chlorinated paraffins in vegetable oils and oil-based dietary supplements by planar solid phase extraction. J Chromatogr A 1606:460380. 10.1016/j.chroma.2019.460380 DOI: 10.1016/j.chroma.2019.460380
Glüge J, Schinkel L, Hungerbühler K, Cariou R, Bogdal C (2018) Environmental risks of medium-chain chlorinated paraffins (MCCPs): a review. Environ Sci Technol 52(12):6743–6760. 10.1021/acs.est.7b06459 DOI: 10.1021/acs.est.7b06459
Du X, Yuan B, Zhou Y, Benskin JP, Qiu Y, Yin G, Zhao J (2018) Short-, medium-, and long-chain chlorinated paraffins in wildlife from paddy fields in the Yangtze River Delta. Environ Sci Technol 52(3):1072–1080. 10.1021/acs.est.7b05595 DOI: 10.1021/acs.est.7b05595
Washington JW, Jenkins TM, Weber EJ (2015) Identification of unsaturated and 2H polyfluorocarboxylate homologous series and their detection in environmental samples and as polymer degradation products. Environ Sci Technol 49(22):13256–13263. 10.1021/acs.est.5b03379 DOI: 10.1021/acs.est.5b03379
Lai A, Clark AM, Escher BI, Fernandez M, McEwen LR, Tian Z, Wang Z, Schymanski EL (2022) The next frontier of environmental unknowns: substances of unknown or variable composition, complex reaction products, or biological materials (UVCBs). Environ Sci Technol 56(12):7448–7466. 10.1021/acs.est.2c00321 DOI: 10.1021/acs.est.2c00321
Schymanski EL, Singer HP, Longrée P, Loos M, Ruff M, Stravs MA, Ripollés Vidal C, Hollender J (2014) Strategies to characterize polar organic contamination in wastewater: exploring the capability of high resolution mass spectrometry. Environ Sci Technol 48(3):1811–1818. 10.1021/es4044374 DOI: 10.1021/es4044374
Carlson JE, Gasson JR, Barth T, Eide I (2012) Extracting homologous series from mass spectrometry data by projection on predefined vectors. Chemom Intell Lab Syst 114:36–43. 10.1016/j.chemolab.2012.02.007 DOI: 10.1016/j.chemolab.2012.02.007
Loos M, Singer H (2017) Nontargeted homologue series extraction from hyphenated high resolution mass spectrometry data. J Cheminform. 10.1186/s13321-017-0197-z DOI: 10.1186/s13321-017-0197-z
Mildau K, van der Hooft JJJ, Flasch M, Warth B, Abiead YE, Koellensperger G, Zanghellini J, Büschl C (2022) Homologue series detection and management in LC-MS data with homologuediscoverer. bioRxiv. 10.1101/2022.07.20.500749 DOI: 10.1101/2022.07.20.500749
Schymanski E (2020) schymane/RChemMass. https://github.com/schymane/RChemMass. Accessed 16 Aug 2020
St. Cholakov G, Stateva RP, Brauner N, Shacham M (2008) Estimation of properties of homologous series with targeted quantitative structure−property relationships. J Chem Eng Data 53(11):2510–2520. 10.1021/je800272x DOI: 10.1021/je800272x
Wiener H (1947) Structural determination of paraffin boiling points. J Am Chem Soc 69(1):17–20. 10.1021/ja01193a005 DOI: 10.1021/ja01193a005
Kováts E (1958) Gas-chromatographische charakterisierung organischer verbindungen. Teil 1: retentionsindices aliphatischer halogenide, alkohole, aldehyde und ketone. Helv Chim Acta 41(7):1915–1932. 10.1002/hlca.19580410703 DOI: 10.1002/hlca.19580410703
Schuffenhauer A, Schneider N, Hintermann S, Auld D, Blank J, Cotesta S, Engeloch C, Fechner N, Gaul C, Giovannoni J, Jansen J, Joslin J, Krastel P, Lounkine E, Manchester J, Monovich LG, Pelliccioli AP, Schwarze M, Shultz MD, Stiefl N, Baeschlin DK (2020) Evolution of Novartis’ small molecule screening deck design. J Med Chem 63(23):14425–14447. 10.1021/acs.jmedchem.0c01332 DOI: 10.1021/acs.jmedchem.0c01332
PubChem. PubChem. https://pubchem.ncbi.nlm.nih.gov/. Accessed 02 Aug 2022
Kim S, Chen J, Cheng T, Gindulyte A, He J, He S, Li Q, Shoemaker BA, Thiessen PA, Yu B, Zaslavsky L, Zhang J, Bolton EE (2021) PubChem in 2021: new data content and improved web interfaces. Nucleic Acids Res 49(D1):D1388–D1395. 10.1093/nar/gkaa971 DOI: 10.1093/nar/gkaa971
ChemSpider | Search and share chemistry. https://www.chemspider.com/. Accessed 2 Aug 2022
Pence HE, Williams A (2010) ChemSpider: an online chemical information resource. J Chem Educ 87(11):1123–1124. 10.1021/ed100697w DOI: 10.1021/ed100697w
Warr W (2021) Report on an NIH workshop on ultralarge chemistry databases. https://doi.org/10.26434/chemrxiv.14554803.v1.
Ehrlich H-C, Rarey M (2011) Maximum common subgraph isomorphism algorithms and their applications in molecular science: a review. WIREs Comput Mol Sci 1(1):68–79. 10.1002/wcms.5 DOI: 10.1002/wcms.5
Raymond JW, Willett P (2002) Maximum common subgraph isomorphism algorithms for the matching of chemical structures. J Comput Aided Mol Des 16(7):521–533. 10.1023/A:1021271615909 DOI: 10.1023/A:1021271615909
Kruger F, Fechner N, Stiefl N (2020) Automated identification of chemical series: classifying like a medicinal chemist. J Chem Inf Model 60(6):2888–2902. 10.1021/acs.jcim.0c00204 DOI: 10.1021/acs.jcim.0c00204
Fournier-Viger P, Lin JC-W (2017) A survey of sequential pattern mining. Data Sci Pattern Recognit 1(1):54–77
Bemis GW, Murcko MA (1996) The properties of known drugs. 1. Molecular frameworks. J Med Chem 39(15):2887–2893. 10.1021/jm9602928 DOI: 10.1021/jm9602928
Lai A. GitHub repository: an algorithm to classify homologous series. https://github.com/adelenelai/onglai-classify-homologues. Accessed 31 Aug 2022
Mohammed Taha H, Aalizadeh R, Alygizakis N, Antignac J-P, Arp HPH, Bade R, Baker N, Belova L, Bijlsma L, Bolton EE, Brack W, Celma A, Chen W-L, Cheng T, Chirsir P, Čirka Ľ, D’Agostino LA, DjoumbouFeunang Y, Dulio V, Fischer S, Gago-Ferrero P, Galani A, Geueke B, Głowacka N, Glüge J, Groh K, Grosse S, Haglund P, Hakkinen PJ, Hale SE, Hernandez F, Janssen EM-L, Jonkers T, Kiefer K, Kirchner M, Koschorreck J, Krauss M, Krier J, Lamoree MH, Letzel M, Letzel T, Li Q, Little J, Liu Y, Lunderberg DM, Martin JW, McEachran AD, McLean JA, Meier C, Meijer J, Menger F, Merino C, Muncke J, Muschket M, Neumann M, Neveu V, Ng K, Oberacher H, O’Brien J, Oswald P, Oswaldova M, Picache JA, Postigo C, Ramirez N, Reemtsma T, Renaud J, Rostkowski P, Rüdel H, Salek RM, Samanipour S, Scheringer M, Schliebner I, Schulz W, Schulze T, Sengl M, Shoemaker BA, Sims K, Singer H, Singh RR, Sumarah M, Thiessen PA, Thomas KV, Torres S, Trier X, van Wezel AP, Vermeulen RCH, Vlaanderen JJ, von der Ohe PC, Wang Z, Williams AJ, Willighagen EL, Wishart DS, Zhang J, Thomaidis NS, Hollender J, Slobodnik J, Schymanski EL (2022) The NORMAN Suspect List Exchange (NORMAN-SLE): facilitating European and worldwide collaboration on suspect screening in high resolution mass spectrometry. Environ Sci Eur 34(1):104. 10.1186/s12302-022-00680-6 DOI: 10.1186/s12302-022-00680-6
Dulio V, Koschorreck J, van Bavel B, van den Brink P, Hollender J, Munthe J, Schlabach M, Aalizadeh R, Agerstrand M, Ahrens L, Allan I, Alygizakis N, Barcelo’ D, Bohlin-Nizzetto P, Boutroup S, Brack W, Bressy A, Christensen JH, Cirka L, Covaci A, Derksen A, Deviller G, Dingemans MML, Engwall M, Fatta-Kassinos D, Gago-Ferrero P, Hernández F, Herzke D, Hilscherová K, Hollert H, Junghans M, Kasprzyk-Hordern B, Keiter S, Kools SAE, Kruve A, Lambropoulou D, Lamoree M, Leonards P, Lopez B, Lópezde Alda M, Lundy L, Makovinská J, Marigómez I, Martin JW, McHugh B, Miège C, O’Toole S, Perkola N, Polesello S, Posthuma L, Rodriguez-Mozaz S, Roessink I, Rostkowski P, Ruedel H, Samanipour S, Schulze T, Schymanski EL, Sengl M, Tarábek P, Ten Hulscher D, Thomaidis N, Togola A, Valsecchi S, van Leeuwen S, von der Ohe P, Vorkamp K, Vrana B, Slobodnik, J (2020) The NORMAN Association and the European Partnership for Chemicals Risk Assessment (PARC): Let’s Cooperate! Environ Sci Eur 32(1), 100. https://doi.org/10.1186/s12302-020-00375-w
Schymanski EL, Kondić T, Neumann S, Thiessen PA, Zhang J, Bolton EE (2021) Empowering large chemical knowledge bases for exposomics: PubChemLite Meets MetFrag. J Cheminform 13(1):19. 10.1186/s13321-021-00489-0 DOI: 10.1186/s13321-021-00489-0
Sorokina M, Merseburger P, Rajan K, Yirik MA, Steinbeck C (2021) COCONUT online: collection of open natural products database. J Cheminform 13(1):2. 10.1186/s13321-020-00478-9 DOI: 10.1186/s13321-020-00478-9
Organization for Economic Co-operation and Development (2018) Toward a new comprehensive global database of per- and polyfluoroalkyl substances (PFASs): summary report on updating the OECD 2007 list of per- and polyfluoroalkyl substances (PFASs); Series on Risk Management No. 39 ENV/JM/MONO(2018)7; p 24. https://www.oecd.org/officialdocuments/publicdisplaydocumentpdf/?cote=ENV-JM-MONO(2018)7&doclanguage=en
Sha B, Schymanski EL, Ruttkies C, Cousins IT, Wang Z (2019) Exploring open cheminformatics approaches for categorizing per- and polyfluoroalkyl substances (PFASs). Environ Sci Process Impacts 21(11):1835–1851. 10.1039/C9EM00321E DOI: 10.1039/C9EM00321E
Daylight Theory: SMARTS—a language for describing molecular patterns. https://www.daylight.com/dayhtml/doc/theory/theory.smarts.html. Accessed 10 Jun 2022
RDKit. https://www.rdkit.org/. Accessed 31 Aug 2022
Landrum G. RDKit Release 2021_09_4 (Q3 2021). https://github.com/rdkit/rdkit/releases/tag/Release_2021_09_4. Accessed 31 Aug 2022
Python Release Python 3.7.0. Python.org. https://www.python.org/downloads/release/python-370/. Accessed 31 Aug 2022
Landrum G. Molecular sanitization in the RDKit. https://www.rdkit.org/docs/RDKit_Book.html#molecular-sanitization. Accessed 20 Jul 2022
Bolton E, Schymanski E, Kondic T, Thiessen P, Zhang J (Jeff) (2022) PubChemLite for Exposomics. https://doi.org/10.5281/zenodo.6383860
NORMAN Network. PubChem Classification Browser - NORMAN Suspect List Exchange Tree. https://pubchem.ncbi.nlm.nih.gov/classification/#hid=101. Accessed 4 Apr 2022
NORMAN Network. NORMAN suspect list exchange. https://www.norman-network.com/nds/SLE/. Accessed 1 Nov 2022
SmilesGenerator (cdk 2.7.1 API). https://cdk.github.io/cdk/2.7/docs/api/org/openscience/cdk/smiles/SmilesGenerator.html. Accessed 17 Aug 2022
Lai A, Schaub J, Steinbeck C, Schymanski EL (2022) Supplementary information for “An algorithm to classify homologous series within compound datasets” (OngLai). https://doi.org/10.5281/zenodo.7035020
Alygizakis N (2018) S23 | EIUBASURF | surfactant suspect list from EI and UBA. https://doi.org/10.5281/zenodo.2648765
Wang Z (2018) S25 | OECDPFAS | List of PFAS from the OECD. https://doi.org/10.5281/zenodo.6349061
Beckers M, Fechner N, Stiefl N (2022) 25 Years of small molecule optimization at novartis: a retrospective analysis of chemical series evolution. 12th Int. Conf. Chem. Struct. Plenary Sess. -1, Noordwijkerhout, The Netherlands
Remove flourinated natural products found by Adelene · Issue #89 · mSorok/NaturalProductsOnline. GitHub. https://github.com/mSorok/NaturalProductsOnline/issues/89. Accessed 1 Jul 2022
Wang Z, Buser AM, Cousins IT, Demattio S, Drost W, Johansson O, Ohno K, Patlewicz G, Richard AM, Walker GW, White GS, Leinala E (2021) A new OECD definition for per- and polyfluoroalkyl substances. Environ Sci Technol 55(23):15575–15578. 10.1021/acs.est.1c06896 DOI: 10.1021/acs.est.1c06896
Organization for Economic Co-operation and Development (2021) Reconciling terminology of the universe of per- and polyfluoroalkyl substances: recommendations and practical guidance; series on risk management; No. 61 ENV/CBC/MONO(2021)25; p 45. https://www.oecd.org/officialdocuments/publicdisplaydocumentpdf/?cote=ENV/CBC/MONO(2021)25&docLanguage=En. Accessed 29 Aug 2022
How to delete the same substructure in one molecule separately · Discussion #4685 · rdkit/rdkit. GitHub. https://github.com/rdkit/rdkit/discussions/4685. Accessed 29 Jun 2022
Koutsoukas A, Paricharak S, Galloway WRJD, Spring DR, Ijzerman AP, Glen RC, Marcus D, Bender A (2014) How diverse are diversity assessment methods? A comparative analysis and benchmarking of molecular descriptor space. J Chem Inf Model 54(1):230–242. 10.1021/ci400469u DOI: 10.1021/ci400469u
Helmus R, ter Laak TL, van Wezel AP, de Voogt P, Schymanski EL (2021) PatRoon: open source software platform for environmental mass spectrometry based non-target screening. J Cheminform 13(1):1. 10.1186/s13321-020-00477-w DOI: 10.1186/s13321-020-00477-w
Schuffenhauer A, Ertl P, Roggo S, Wetzel S, Koch MA, Waldmann H (2007) The scaffold tree—visualization of the scaffold universe by hierarchical scaffold classification. J Chem Inf Model 47(1):47–58. 10.1021/ci600338x DOI: 10.1021/ci600338x
Faulon J-L, Visco DP, Pophale RS (2003) The signature molecular descriptor. 1. Using extended valence sequences in QSAR and QSPR studies. J Chem Inf Comput Sci. 43(3):707–720. 10.1021/ci020345w DOI: 10.1021/ci020345w
Morgan HL (1965) The generation of a unique machine description for chemical structures—a technique developed at chemical abstracts service. J Chem Doc 5(2):107–113. 10.1021/c160017a018 DOI: 10.1021/c160017a018
Rogers D, Hahn M (2010) Extended-connectivity fingerprints. J Chem Inf Model 50(5):742–754. 10.1021/ci100050t DOI: 10.1021/ci100050t
Rajan K, Zielesny A, Steinbeck C (2021) DECIMER 1.0: deep learning for chemical image recognition using transformers. J Cheminform 13(1):61. 10.1186/s13321-021-00538-8 DOI: 10.1186/s13321-021-00538-8
Wang Z, Adu-Kumi S, Diamond ML, Guardans R, Harner T, Harte A, Kajiwara N, Klánová J, Liu J, Moreira EG, Muir DCG, Suzuki N, Pinas V, Seppälä T, Weber R, Yuan B (2022) Enhancing scientific support for the stockholm convention’s implementation: an analysis of policy needs for scientific evidence. Environ Sci Technol 56(5):2936–2949. 10.1021/acs.est.1c06120 DOI: 10.1021/acs.est.1c06120