Communication poster (Colloques, congrès, conférences scientifiques et actes)
Supporting findability of COVID-19 research with large-scale text mining of scientific publications
WELTER, Danielle; VEGA MORENO, Carlos Gonzalo; BIRYUKOV, Maria et al.
2020International FAIR Convergence Symposium
 

Documents


Texte intégral
FAIRBioKB_poster.pdf
Postprint Auteur (2.03 MB)
Poster PDF
Télécharger

Tous les documents dans ORBilu sont protégés par une licence d'utilisation.

Envoyer vers



Détails



Mots-clés :
FAIR principles; COVID-19; text mining
Résumé :
[en] When the COVID-19 pandemic hit in early 2020, a lot of research efforts were quickly redirected towards studies on SARS-CoV2 and COVID-19 disease, from the sequencing and assembly of viral genomes to the elaboration of robust testing methodologies and the development of treatment and vaccination strategies. At the same time, a flurry of scientific publications around SARS-CoV-2 and COVID-19 began to appear, making it increasingly difficult for researchers to stay up-to-date with latest trends and developments in this rapidly evolving field. The BioKB platform is a pipeline which, by exploiting text mining and semantic technologies, helps researchers easily access semantic content of thousands of abstracts and full text articles. The content of the articles is analysed and concepts from a range of contexts, including proteins, species, chemicals, diseases and biological processes are tagged based on existing dictionaries of controlled terms. Co-occurring concepts are classified based on their asserted relationship and the resulting subject-relation-object triples are stored in a publicly accessible human- and machine-readable knowledge base. All concepts in the BioKB dictionaries are linked to stable, persistent identifiers, either a resource accession such as an Ensembl, Uniprot or PubChem ID for genes, proteins and chemicals, or an ontology term ID for diseases, phenotypes and other ontology terms. In order to improve COVID-19 related text mining, we extended the underlying dictionaries to include many additional viral species (via NCBI Taxonomy identifiers), phenotypes from the Human Phenotype Ontology (HPO), COVID-related concepts including clinical and laboratory tests from the COVID-19 ontology, as well as additional diseases (DO) and biological processes (GO). We also added all viral proteins found in UniProt and gene entries from EntrezGene to increase the sensitivity of the text mining pipeline to viral data. To date, BioKB has indexed over 270’000 sentences from 21’935 publications relating to coronavirus infections, with publications dating from 1963 to 2021, 3’863 of which were published this year. We are currently working to further refine the text mining pipeline by training it on the extraction of increasingly complex relations such as protein-phenotype relationships. We are also regularly adding new terms to our dictionaries for areas where coverage is currently low, such as clinical and laboratory tests and procedures and novel drug treatments.
Centre de recherche :
- Luxembourg Centre for Systems Biomedicine (LCSB): Bioinformatics Core (R. Schneider Group)
Disciplines :
Sciences du vivant: Multidisciplinaire, généralités & autres
Auteur, co-auteur :
WELTER, Danielle ;  University of Luxembourg > Luxembourg Centre for Systems Biomedicine (LCSB) > Bioinformatics Core
VEGA MORENO, Carlos Gonzalo ;  University of Luxembourg > Luxembourg Centre for Systems Biomedicine (LCSB) > Bioinformatics Core
BIRYUKOV, Maria ;  University of Luxembourg > Luxembourg Centre for Systems Biomedicine (LCSB) > Bioinformatics Core
GROUES, Valentin  ;  University of Luxembourg > Luxembourg Centre for Systems Biomedicine (LCSB) > Bioinformatics Core
GHOSH, Soumyabrata  ;  University of Luxembourg > Luxembourg Centre for Systems Biomedicine (LCSB) > Bioinformatics Core
SCHNEIDER, Reinhard ;  University of Luxembourg > Luxembourg Centre for Systems Biomedicine (LCSB) > Bioinformatics Core
SATAGOPAM, Venkata ;  University of Luxembourg > Luxembourg Centre for Systems Biomedicine (LCSB) > Bioinformatics Core
Co-auteurs externes :
no
Langue du document :
Anglais
Titre :
Supporting findability of COVID-19 research with large-scale text mining of scientific publications
Date de publication/diffusion :
27 novembre 2020
Nom de la manifestation :
International FAIR Convergence Symposium
Organisateur de la manifestation :
CODATA and GO FAIR
Date de la manifestation :
27-11-2020 to 4-12-2020
Manifestation à portée :
International
Focus Area :
Computational Sciences
Systems Biomedicine
Disponible sur ORBilu :
depuis le 01 janvier 2021

Statistiques


Nombre de vues
341 (dont 33 Unilu)
Nombre de téléchargements
86 (dont 11 Unilu)

citations OpenAlex
 
0

Bibliographie


Publications similaires



Contacter ORBilu