Paper published in a journal (Scientific congresses, symposiums and conference proceedings)
Guided Distant Supervision for Multilingual Relation Extraction Data: Adapting to a New Language
PLUM, Alistair; Ranasinghe, Tharindu; PURSCHKE, Christoph
2024In Proceedings of LREC-COLING 2024, p. 7982–7992
Peer reviewed
 

Files


Full Text
2024.lrec-main.703.pdf
Author postprint (331.23 kB)
Download

All documents in ORBilu are protected by a user license.

Send to



Details



Keywords :
CuCo Lab
Abstract :
[en] Relation extraction is essential for extracting and understanding biographical information in the context of digital humanities and related subjects. There is a growing interest in the community to build datasets capable of training machine learning models to extract relationships. However, annotating such datasets can be expensive and time-consuming, in addition to being limited to English. This paper applies guided distant supervision to create a large biographical relationship extraction dataset for German. Our dataset, composed of more than 80,000 instances for nine relationship types, is the largest biographical German relationship extraction dataset. We also create a manually annotated dataset with 2000 instances to evaluate the models and release it together with the dataset compiled using guided distant supervision. We train several state-of-the-art machine learning models on the automatically created dataset and release them as well. Furthermore, we experiment with multilingual and cross-lingual zero-shot experiments that could benefit many low-resource languages.
Disciplines :
Languages & linguistics
Author, co-author :
PLUM, Alistair  ;  University of Luxembourg > Faculty of Humanities, Education and Social Sciences (FHSE) > Department of Humanities (DHUM) > Luxembourg Studies
Ranasinghe, Tharindu
PURSCHKE, Christoph  ;  University of Luxembourg > Faculty of Humanities, Education and Social Sciences (FHSE) > Department of Humanities (DHUM) > Luxembourg Studies
External co-authors :
yes
Language :
English
Title :
Guided Distant Supervision for Multilingual Relation Extraction Data: Adapting to a New Language
Publication date :
May 2024
Event name :
LREC-COLING 2024
Event date :
2024
Audience :
International
Journal title :
Proceedings of LREC-COLING 2024
Pages :
7982–7992
Peer reviewed :
Peer reviewed
Available on ORBilu :
since 10 June 2024

Statistics


Number of views
127 (1 by Unilu)
Number of downloads
40 (0 by Unilu)

Scopus citations®
 
2
Scopus citations®
without self-citations
1

Bibliography


Similar publications



Contact ORBilu