[en] In this paper we investigate how scientific and medical papers about Covid-19 can be effectively mined. For this purpose we use the CORD19 dataset which is a huge collection of all papers published about and around the SARS-CoV2 virus and the pandemic it caused. We discuss how classical text mining algorithms like Latent Semantic Analysis (LSA) or its modern version Latent Drichlet Allocation (LDA) can be used for this purpose and also touch more modern variant of these algorithms like word2vec which came with deep learning wave and show their advantages and disadvantages each. We finish the paper with showing some topic examples from the corpus and answer questions such as which topics are the most prominent for the corpus or how many percentage of the corpus is dedicated to them. We also give a discussion of how topics around RNA research in connection with Covid-19 can be examined.
Main work title :
Applied Informatics, 5th International Conference, ICAI 2022, Arequipa, Peru, October 27–29, 2022, Proceedings