BERT; COVID-19; Knowledge extraction; Knowledge graphs; NER; NLP; Text-based emotion detection (TBED); Transfer learning; Management Information Systems; Information Systems; Computer Science Applications; Artificial Intelligence
Abstract :
[en] Since the beginning of the COVID-19 pandemic almost two years ago, there have been more than 700,000 scientific papers published on the subject. An individual researcher cannot possibly get acquainted with such a huge text corpus and, therefore, some help from artificial intelligence (AI) is highly needed. We propose the AI-based tool to help researchers navigate the medical papers collections in a meaningful way and extract some knowledge from scientific COVID-19 papers. The main idea of our approach is to get as much semi-structured information from text corpus as possible, using named entity recognition (NER) with a model called PubMedBERT and Text Analytics for Health service, then store the data into NoSQL database for further fast processing and insights generation. Additionally, the contexts in which the entities were used (neutral or negative) are determined. Application of NLP and text-based emotion detection (TBED) methods to COVID-19 text corpus allows us to gain insights on important issues of diagnosis and treatment (such as changes in medical treatment over time, joint treatment strategies using several medications, and the connection between signs and symptoms of coronavirus, etc.).
Disciplines :
Mathematics
Author, co-author :
Soshnikov, Dmitry ; MSU Institute for Artificial Intelligence, Lomonosov Moscow State University, Moscow, Russian Federation ; Microsoft, Developer Relations, Moscow, Russian Federation ; Faculty of Computer Science, Higher School of Economics, Moscow, Russian Federation ; Moscow Aviation Institute, Moscow, Russian Federation
PETROVA, Tatiana ; University of Luxembourg > Interdisciplinary Centre for Security, Reliability and Trust (SNT) > SEDAN ; MSU Institute for Artificial Intelligence, Lomonosov Moscow State University, Moscow, Russian Federation ; Faculty of Physics, Lomonosov Moscow State University, Moscow, Russian Federation
Soshnikova, Vickie; Phystech-Lyceum of Natural Sciences and Mathematics Named after P.L. Kapitza, Dolgoprudniy, Russian Federation
Grunin, Andrey; MSU Institute for Artificial Intelligence, Lomonosov Moscow State University, Moscow, Russian Federation ; Faculty of Physics, Lomonosov Moscow State University, Moscow, Russian Federation
External co-authors :
yes
Language :
English
Title :
Analyzing COVID-19 Medical Papers Using Artificial Intelligence: Insights for Researchers and Medical Professionals
This research was supported by the Ministry of Science and Higher Education of the Russian Federation (Grant № 075-15-2020-801).The authors thank MSU Institute for Artificial Intelligence for administrative and technical support.Funding: This research was supported by the Ministry of Science and Higher Education of the Russian Federation (Grant № 075-15-2020-801).
Minaee, S.; Kalchbrenner, N.; Cambria, E.; Nikzad, N.; Chenaghlu, M.; Gao, J. Deep Learning–based Text Classification: A Comprehensive Review. ACM Comput. Surv. 2021, 54, 40. [CrossRef]
Ligthart, A.; Catal, C.; Tekinerdogan, B. Systematic reviews in sentiment analysis: A tertiary study. Artif. Intell. Rev. 2021, 54, 4997–5053. [CrossRef]
Nadeesha, P.; Dehmer, M.; Emmert-Streib, F. Named Entity Recognition and Relation Detection for Biomedical Information Extraction. Front. Cell Dev. Biol. 2020, 8, 673. [CrossRef]
Widyassari, A.P.; Rustad, S.; Shidik, G.F.; Noersasongko, E.; Syukur, A.; Affandy, A. Review of automatic text summarization techniques & methods. J. King Saud Univ.—Comput. Inf. Sci. 2020, 1319–1578. [CrossRef]
Mutabazi, E.; Ni, J.; Tang, G.; Cao, W. A Review on Medical Textual Question Answering Systems Based on Deep Learning Approaches. Appl. Sci. 2021, 11, 5456. [CrossRef]
Wang, L.; Lo, K.; Chandrasekhar, Y.; Reas, R.; Yang, J.; Eide, D.; Funk, K.; Kinney, R.; Liu, Z. CORD-19: The Covid-19 Open Research Dataset. arXiv 2020, arXiv:2004.10706v2.
Extance, A. How AI technology can tame the scientific literature. Nature 2018, 561, 273–274. [CrossRef] [PubMed]
Bullock, J.; Luccioni, A.; Pham, K.H.; Lam, C.S.N.; Luengo-Oroz, M. Mapping the landscape of artificial intelligence applications against COVID-19. J. Artif. Inteill. Res. 2020, 69, 807–845. [CrossRef]
Roberts, K.; Alam, T.; Bedrick, S.; Demner-Fushman, D.; Lo, K.; Soboroff, I.; Voorhees, E.; Wang, L.L.; Hersh, W.R. TREC-Covid: Rationale and structure of an information retrieval shared task for covid-19. J. Am. Med. Inform. Assoc. 2020, 27, 1431–1436. [CrossRef] [PubMed]
Tang, R.; Nogueira, R.; Zhang, E.; Gupta, N.; Cam, P.; Cho, K.; Lin, J. Rapidly bootstrapping a question answering dataset for COVID-19. arXiv 2020, arXiv:2004.11339.
Wang, L.L.; Lo, K. Text mining approaches for dealing with the rapidly expanding literature on COVID-19. Brief. Bioinform. 2021, 22, 781–799. [CrossRef] [PubMed]
Devlin, J.; Chang, M.-W.; Lee, K.; Toutanova, K. BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Vol. 1 (Long and Short Papers). Minneapolis, Minnesota: Association for Computational Linguistics; Association for Computational Linguistics: Minneapolis, Minnesota, 2019; pp. 4171–4186.
Beltagy, I.; Lo, K.; Cohan, A. SciBERT: A pretrained language model for scientific text. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP); Association for Computational Linguistics: Hong Kong, China, 2019; pp. 3615–3620.
Lee, J.; Yoon, W.; Kim, S.; Kim, D.; Kim, S.; So, C.H.; Kang, J. BioBERT: A pre-trained biomedical language representation model for biomedical text mining. Bioinformatics 2019, 36, 1234–1240. [CrossRef] [PubMed]
Yuxian, G.; Robert Tinn, R.; Hao Cheng, H.; Lucas, M.; Usuyama, N.; Liu, X.; Naumann, T.; Gao, J.; Poon, H. Domain-Specific Language Model Pretraining for Biomedical Natural Language Processing. arXiv 2020, arXiv:abs/2007.15779.
National Library of Medicine. Available online: https://pubmed.ncbi.nlm.nih.gov (accessed on 1 December 2021).
COVID-19 Knowledge Graph. Available online: https://covidgraph.org/(accessed on 1 December 2021).
Ilievski, F.; Garijo, D.; Chalupsky, H.; Divvala, N.T.; Yao, Y.; Rogers, C.; Li, R.; Liu, J.; Singh, A.; Schwabe, D.; et al. KGTK: A toolkit for large knowledge graph manipulation and analysis. In Proceedings of the 19th International Semantic Web Conference, Athens, Greece, 2–6 November 2020.
A Free and Open Knowledge Base. Available online: https://www.wikidata.org/(accessed on 1 December 2021).
Cohan, A.; Feldman, S.; Beltagy, I.; Downey, D.; Weld, D.S. Specter: Document-level representation learning using citation-informed transformers. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, Online, 5–10 July 2020.
Newman-Griffis, D.; Lai, A.M.; Fosler-Lussier, E. Jointly embedding entities and text with distant supervision. In Proceedings of the Third Workshop on Representation Learning for NLP, Association for Computational Linguistics, Melbourne, Australia, 20 July 2018; pp. 195–206.
Espinosa-Anke, L.; Schockaert, S. SeVeN: Augmenting word embeddings with unsupervised relation vectors. In Proceedings of the 27th International Conference on Computational Linguistics, Santa Fe, NM, USA, 20 August 2018; pp. 2653–2665.
Oniani, D.; Jiang, G.; Liu, H.; Shen, F. Constructing co-occurrence network embeddings to assist association extraction for COVID-19 and other coronavirus infectious diseases. J. Am. Med. Inform. Assoc. 2020, 27, 1259–1267. [CrossRef] [PubMed]
Unified Medical Language System (UMLS). Available online: https://www.nlm.nih.gov/research/umls/index.html (accessed on 1 December 2021).
Introducing Text Analytics for Health. Available online: https://techcommunity.microsoft.com/t5/azure-ai/introducing-text-analytics-for-health/ba-p/1505152 (accessed on 1 December 2021).
Azure Cosmos DB. Available online: https://azure.microsoft.com/en-us/services/cosmos-db/(accessed on 1 December 2021).
Python. Available online: https://www.python.org/(accessed on 1 December 2021).
The Open Graph Viz Platform. Available online: https://gephi.org (accessed on 1 December 2021).
Azure Text Analytics Client Library for Python. Available online: https://github.com/Azure/azure-sdk-for-python/blob/main/sdk/textanalytics/azure-ai-textanalytics/README.md (accessed on 1 December 2021).
Azure Machine Learning. Available online: https://azure.microsoft.com/en-us/services/machine-learning/(accessed on 1 December 2021).
COVID-19 Treatment Guidelines. Chloroquine or Hydroxychloroquine and/or Azithromycin. Available online: https://www. covid19treatmentguidelines.nih.gov/therapies/antiviral-therapy/chloroquine-or-hydroxychloroquine-and-or-azithromycin/(accessed on 1 December 2021).
COVID-19 Treatment Guidelines. Lopinavir/Ritonavir and Other HIV Protease Inhibitors. Available online: https://www.covid1 9treatmentguidelines.nih.gov/therapies/antiviral-therapy/lopinavir-ritonavir-and-other-hiv-protease-inhibitors/(accessed on 1 December 2021).
Hamidi Alamdari, D.; Bagheri Moghaddam, A.; Amini, S.; Alamdari, A.H.; Damsaz, M.; Yarahmadi, A. The Application of a Reduced Dye Used in Orthopedics as a Novel Treatment against Coronavirus (COVID-19): A Suggested Therapeutic Protocol. Arch. Bone Jt. Surg. 2020, 8 (Supp. Sl1), 291–294. [CrossRef] [PubMed]
Pundir, H.; Joshi, T.; Joshi, T.; Sharma, P.; Mathpal, S.; Chandra, S.; Tamta, S. Using Chou’s 5-steps rule to study pharmacophore-based virtual screening of SARS-CoV-2 Mpro inhibitors. Mol. Divers. 2021, 25, 1731–1744. [CrossRef] [PubMed]
COVID-19 Treatment Guidelines. Remdesivir. Available online: https://www.covid19treatmentguidelines.nih.gov/therapies/antiviral-therapy/remdesivir/(accessed on 1 December 2021).