Poster (Scientific congresses, symposiums and conference proceedings)
Supporting findability of COVID-19 research with large-scale text mining of scientific publications
Welter, Danielle; Vega Moreno, Carlos Gonzalo; Biryukov, Maria et al.
2020International FAIR Convergence Symposium
 

Files


Full Text
FAIRBioKB_poster.pdf
Author postprint (2.03 MB)
Poster PDF
Download

All documents in ORBilu are protected by a user license.

Send to



Details



Keywords :
FAIR principles; COVID-19; text mining
Abstract :
[en] When the COVID-19 pandemic hit in early 2020, a lot of research efforts were quickly redirected towards studies on SARS-CoV2 and COVID-19 disease, from the sequencing and assembly of viral genomes to the elaboration of robust testing methodologies and the development of treatment and vaccination strategies. At the same time, a flurry of scientific publications around SARS-CoV-2 and COVID-19 began to appear, making it increasingly difficult for researchers to stay up-to-date with latest trends and developments in this rapidly evolving field. The BioKB platform is a pipeline which, by exploiting text mining and semantic technologies, helps researchers easily access semantic content of thousands of abstracts and full text articles. The content of the articles is analysed and concepts from a range of contexts, including proteins, species, chemicals, diseases and biological processes are tagged based on existing dictionaries of controlled terms. Co-occurring concepts are classified based on their asserted relationship and the resulting subject-relation-object triples are stored in a publicly accessible human- and machine-readable knowledge base. All concepts in the BioKB dictionaries are linked to stable, persistent identifiers, either a resource accession such as an Ensembl, Uniprot or PubChem ID for genes, proteins and chemicals, or an ontology term ID for diseases, phenotypes and other ontology terms. In order to improve COVID-19 related text mining, we extended the underlying dictionaries to include many additional viral species (via NCBI Taxonomy identifiers), phenotypes from the Human Phenotype Ontology (HPO), COVID-related concepts including clinical and laboratory tests from the COVID-19 ontology, as well as additional diseases (DO) and biological processes (GO). We also added all viral proteins found in UniProt and gene entries from EntrezGene to increase the sensitivity of the text mining pipeline to viral data. To date, BioKB has indexed over 270’000 sentences from 21’935 publications relating to coronavirus infections, with publications dating from 1963 to 2021, 3’863 of which were published this year. We are currently working to further refine the text mining pipeline by training it on the extraction of increasingly complex relations such as protein-phenotype relationships. We are also regularly adding new terms to our dictionaries for areas where coverage is currently low, such as clinical and laboratory tests and procedures and novel drug treatments.
Research center :
- Luxembourg Centre for Systems Biomedicine (LCSB): Bioinformatics Core (R. Schneider Group)
Disciplines :
Life sciences: Multidisciplinary, general & others
Author, co-author :
Welter, Danielle ;  University of Luxembourg > Luxembourg Centre for Systems Biomedicine (LCSB) > Bioinformatics Core
Vega Moreno, Carlos Gonzalo ;  University of Luxembourg > Luxembourg Centre for Systems Biomedicine (LCSB) > Bioinformatics Core
Biryukov, Maria ;  University of Luxembourg > Luxembourg Centre for Systems Biomedicine (LCSB) > Bioinformatics Core
Groues, Valentin  ;  University of Luxembourg > Luxembourg Centre for Systems Biomedicine (LCSB) > Bioinformatics Core
Ghosh, Soumyabrata  ;  University of Luxembourg > Luxembourg Centre for Systems Biomedicine (LCSB) > Bioinformatics Core
Schneider, Reinhard ;  University of Luxembourg > Luxembourg Centre for Systems Biomedicine (LCSB) > Bioinformatics Core
Satagopam, Venkata ;  University of Luxembourg > Luxembourg Centre for Systems Biomedicine (LCSB) > Bioinformatics Core
External co-authors :
no
Language :
English
Title :
Supporting findability of COVID-19 research with large-scale text mining of scientific publications
Publication date :
27 November 2020
Event name :
International FAIR Convergence Symposium
Event organizer :
CODATA and GO FAIR
Event date :
27-11-2020 to 4-12-2020
Audience :
International
Focus Area :
Computational Sciences
Systems Biomedicine
Available on ORBilu :
since 01 January 2021

Statistics


Number of views
234 (33 by Unilu)
Number of downloads
51 (11 by Unilu)

Bibliography


Similar publications



Contact ORBilu