Communication publiée dans un ouvrage (Colloques, congrès, conférences scientifiques et actes)
A Comparative Study of Sentence Embeddings for Unsupervised Extractive Multi-document Summarization
LAMSIYAH, Salima; SCHOMMER, Christoph
2023In Artificial Intelligence and Machine Learning
Peer reviewed
 

Documents


Texte intégral
paper_Bnaic22_3_April_2023.pdf
Postprint Éditeur (352.4 kB)
Demander un accès

Tous les documents dans ORBilu sont protégés par une licence d'utilisation.

Envoyer vers



Détails



Mots-clés :
Unsupervised Multi-Document Summarization; Sentence Embeddings; Transfer Learning; Contrastive Learning; Coreference Resolution
Résumé :
[en] Obtaining large-scale and high-quality training data for multi-document summarization (MDS) tasks is time-consuming and resource-intensive, hence, supervised models can only be applied to limited domains and languages. In this paper, we introduce unsupervised extractive methods for both generic and query-focused MDS tasks, intending to produce a relevant summary from a collection of documents without using labeled training data or domain knowledge. More specifically, we leverage the potential of transfer learning from recent sentence embedding models to encode the input documents into rich semantic representations. Moreover, we use a coreference resolution system to resolve the broken pronominal coreference expressions in the generated summaries, aiming to improve their cohesion and textual quality. Furthermore, we provide a comparative analysis of several existing sentence embedding models in the context of unsupervised extractive multi-document summarization. Experiments on the standard DUC'2004-2007 datasets demonstrate that the proposed methods are competitive with previous unsupervised methods and are even comparable to recent supervised deep learning-based methods. The empirical results also show that the SimCSE embedding model, based on contrastive learning, achieves substantial improvements over strong sentence embedding models. Finally, the newly involved coreference resolution method is proven to bring a noticeable improvement to the unsupervised extractive MDS task.
Disciplines :
Sciences informatiques
Auteur, co-auteur :
LAMSIYAH, Salima  ;  University of Luxembourg > Faculty of Science, Technology and Medicine (FSTM) > Department of Computer Science (DCS)
SCHOMMER, Christoph  ;  University of Luxembourg > Faculty of Science, Technology and Medicine (FSTM) > Department of Computer Science (DCS)
Co-auteurs externes :
no
Langue du document :
Anglais
Titre :
A Comparative Study of Sentence Embeddings for Unsupervised Extractive Multi-document Summarization
Date de publication/diffusion :
2023
Nom de la manifestation :
Artificial Intelligence and Machine Learning: 34th Joint Benelux Conference, BNAIC/Benelearn 2022
Date de la manifestation :
November 7 – November 9, 2022
Manifestation à portée :
International
Titre de l'ouvrage principal :
Artificial Intelligence and Machine Learning
Maison d'édition :
Springer Nature Switzerland, Cham, Inconnu/non spécifié
ISBN/EAN :
978-3-031-39144-6
Pagination :
78--95
Peer reviewed :
Peer reviewed
Disponible sur ORBilu :
depuis le 16 septembre 2023

Statistiques


Nombre de vues
122 (dont 8 Unilu)
Nombre de téléchargements
0 (dont 0 Unilu)

citations Scopus®
 
2
citations Scopus®
sans auto-citations
0
citations OpenAlex
 
2

Bibliographie


Publications similaires



Contacter ORBilu