Reference : Supporting findability of COVID-19 research with large-scale text mining of scientifi...
Scientific congresses, symposiums and conference proceedings : Poster
Life sciences : Multidisciplinary, general & others
Computational Sciences; Systems Biomedicine
http://hdl.handle.net/10993/45287
Supporting findability of COVID-19 research with large-scale text mining of scientific publications
English
Welter, Danielle mailto [University of Luxembourg > Luxembourg Centre for Systems Biomedicine (LCSB) > Bioinformatics Core >]
Vega Moreno, Carlos Gonzalo mailto [University of Luxembourg > Luxembourg Centre for Systems Biomedicine (LCSB) > Bioinformatics Core >]
Biryukov, Maria mailto [University of Luxembourg > Luxembourg Centre for Contemporary and Digital History (C2DH) > Digital Infrastructure >]
Groues, Valentin mailto [University of Luxembourg > Luxembourg Centre for Systems Biomedicine (LCSB) > Bioinformatics Core >]
Ghosh, Soumyabrata mailto [University of Luxembourg > Luxembourg Centre for Systems Biomedicine (LCSB) > Bioinformatics Core >]
Schneider, Reinhard mailto [University of Luxembourg > Luxembourg Centre for Systems Biomedicine (LCSB) > Bioinformatics Core >]
Satagopam, Venkata mailto [University of Luxembourg > Luxembourg Centre for Systems Biomedicine (LCSB) > Bioinformatics Core >]
27-Nov-2020
No
No
International
International FAIR Convergence Symposium
27-11-2020 to 4-12-2020
CODATA and GO FAIR
[en] FAIR principles ; COVID-19 ; text mining
[en] When the COVID-19 pandemic hit in early 2020, a lot of research efforts were quickly redirected towards studies on SARS-CoV2 and COVID-19 disease, from the sequencing and assembly of viral genomes to the elaboration of robust testing methodologies and the development of treatment and vaccination strategies. At the same time, a flurry of scientific publications around SARS-CoV-2 and COVID-19 began to appear, making it increasingly difficult for researchers to stay up-to-date with latest trends and developments in this rapidly evolving field. The BioKB platform is a pipeline which, by exploiting text mining and semantic technologies, helps researchers easily access semantic content of thousands of abstracts and full text articles. The content of the articles is analysed and concepts from a range of contexts, including proteins, species, chemicals, diseases and biological processes are tagged based on existing dictionaries of controlled terms. Co-occurring concepts are classified based on their asserted relationship and the resulting subject-relation-object triples are stored in a publicly accessible human- and machine-readable knowledge base. All concepts in the BioKB dictionaries are linked to stable, persistent identifiers, either a resource accession such as an Ensembl, Uniprot or PubChem ID for genes, proteins and chemicals, or an ontology term ID for diseases, phenotypes and other ontology terms. In order to improve COVID-19 related text mining, we extended the underlying dictionaries to include many additional viral species (via NCBI Taxonomy identifiers), phenotypes from the Human Phenotype Ontology (HPO), COVID-related concepts including clinical and laboratory tests from the COVID-19 ontology, as well as additional diseases (DO) and biological processes (GO). We also added all viral proteins found in UniProt and gene entries from EntrezGene to increase the sensitivity of the text mining pipeline to viral data. To date, BioKB has indexed over 270’000 sentences from 21’935 publications relating to coronavirus infections, with publications dating from 1963 to 2021, 3’863 of which were published this year. We are currently working to further refine the text mining pipeline by training it on the extraction of increasingly complex relations such as protein-phenotype relationships. We are also regularly adding new terms to our dictionaries for areas where coverage is currently low, such as clinical and laboratory tests and procedures and novel drug treatments.
Luxembourg Centre for Systems Biomedicine (LCSB): Bioinformatics Core (R. Schneider Group)
http://hdl.handle.net/10993/45287
10.5281/zenodo.4300199
https://zenodo.org/record/4300199#.X8ZPXqpKhMY

File(s) associated to this reference

Fulltext file(s):

FileCommentaryVersionSizeAccess
Open access
FAIRBioKB_poster.pdfPoster PDFAuthor postprint1.98 MBView/Open

Bookmark and Share SFX Query

All documents in ORBilu are protected by a user license.