Article (Scientific journals)
BioMDSum: An Effective Hybrid Biomedical Multi-Document Summarization Method Based on PageRank and Longformer Encoder-Decoder
Aftiss, Azzedine; LAMSIYAH, Salima; Ouatik El Alaoui, Said et al.
2024In IEEE Access, 12, p. 188013 - 188031
Peer Reviewed verified by ORBi
 

Files


Full Text
BioMDSum_An_Effective_Hybrid_Biomedical_Multi-Document_Summarization_Method_Based_on_PageRank_and_Longformer_Encoder-Decoder.pdf
Author postprint (3.25 MB)
Download

All documents in ORBilu are protected by a user license.

Send to



Details



Keywords :
Biomedical multi-document summarization; hybrid summarization; K-means clustering; longformer encoder-decoder; PageRank algorithm; sentence-BERT; Biomedical documents; Encoder-decoder; Extractive summarizations; Hybrid summarization; K-means++ clustering; Longformer encoder-decoder; Multi documents summarization; Sentence-BERT; Computer Science (all); Materials Science (all); Engineering (all)
Abstract :
[en] Biomedical multi-document summarization (BioMDSum) involves automatically generating concise and informative summaries from collections of related biomedical documents. While extractive summarization methods have shown promise, they often produce incoherent summaries. Onethe other hand, fully abstractive methods yield coherent summaries but demand extensive training datasets and computational resources due to the typically lengthy nature of biomedical documents. Toeaddress these challenges, weepropose a hybrid summarization method that combines the strengths of both approaches. The proposed method consists of two main phases: (i) an extractive summarization phase that uses k-means clustering to group similar sentences based on their cosine similarity between embeddings generated by the sentence-BERT model, followed by the PageRank algorithm for sentence scoring and selection; and (ii) an abstractive summarization phase that fine-Tunes a Longform Encoder-Decoder (LED) transformer model to generate a concise and coherent summary from the sentences selected during the extractive phase. Weeconducted several experiments on the standard biomedical multi-document summarization datasets Cochrane and MS2. The results demonstrate that the proposed method is competitive and outperforms recent state-of-The-Art systems based on ROUGE evaluation measures. Specifically, our model achieved ROUGE-1, ROUGE-2, ROUGE-L, BERTScore, and METEOR scores of 29.41%, 6.57%, 18.31%, 85.95%, and 22.15% on the Cochrane dataset, and 28.79%, 8.22%, 17.93%, 85.51%, and 25.17% on the MS 2 dataset, respectively. Furthermore, aneablation analysis shows that integrating extractive and abstractive phases in our hybrid summarization method enhances the overall performance of the proposed approach.
Disciplines :
Computer science
Author, co-author :
Aftiss, Azzedine ;  Ibn Tofail University, Engineering Sciences Laboratory, National School of Applied Sciences, Kenitra, Morocco
LAMSIYAH, Salima  ;  University of Luxembourg > Faculty of Science, Technology and Medicine (FSTM) > Department of Computer Science (DCS)
Ouatik El Alaoui, Said ;  Ibn Tofail University, Engineering Sciences Laboratory, National School of Applied Sciences, Kenitra, Morocco
SCHOMMER, Christoph  ;  University of Luxembourg > Faculty of Science, Technology and Medicine (FSTM) > Department of Computer Science (DCS)
External co-authors :
yes
Language :
English
Title :
BioMDSum: An Effective Hybrid Biomedical Multi-Document Summarization Method Based on PageRank and Longformer Encoder-Decoder
Publication date :
2024
Journal title :
IEEE Access
ISSN :
2169-3536
Publisher :
Institute of Electrical and Electronics Engineers Inc.
Volume :
12
Pages :
188013 - 188031
Peer reviewed :
Peer Reviewed verified by ORBi
Available on ORBilu :
since 09 January 2025

Statistics


Number of views
63 (0 by Unilu)
Number of downloads
63 (0 by Unilu)

Scopus citations®
 
3
Scopus citations®
without self-citations
3
OpenCitations
 
0
OpenAlex citations
 
3
WoS citations
 
0

Bibliography


Similar publications



Contact ORBilu