data wrangling; multilingual diachronic analysis; linguistic linked open data
Abstract :
[en] The article deals with data wrangling in a multilingual collection intended for diachronic analysis and linguistic linked open data modelling for tracing concept change over time. Two types of static word embeddings are used: word2vec (French and Hebrew data sets), and fastText (Latin and Lithuanian data sets). We model examples from these embeddings via the OntoLex-FrAC formalism. To address the challenge of heterogeneity, we use a minimalist workflow design allowing for both convergence and flexibility in attaining the project goals.
Research center :
Luxembourg Centre for Contemporary and Digital History (C2DH) > Digital History & Historiography (DHI)
Disciplines :
Arts & humanities: Multidisciplinary, general & others
Author, co-author :
ARMASELU, Florentina ; University of Luxembourg > Luxembourg Centre for Contemporary and Digital History (C2DH) > Digital History and Historiography
Mcgillivray, Barbara
Liebeskind, Chaya
Valūnaitė-Oleškevičienė, Giedrė
Utka, Andrius
Gifu, Daniela
Fahad Khan, Anas
Apostol, Elena-Simona
Truică, Ciprian-Octavian
External co-authors :
yes
Language :
English
Title :
Workflow Reversal and Data Wrangling in Multilingual Diachronic Analysis and Linguistic Linked Open Data Modelling
Publication date :
September 2023
Event name :
LDK 2023 Conference
Event organizer :
University of Vienna
Event date :
from 12 to 15 September 2023
Audience :
International
Main work title :
Proceedings of the 4th Conference on Language, Data and Knowledge
CA18209 - European network for Web-centred linguistic data science (NexusLinguarum)
Funders :
COST - European Cooperation in Science and Technology
Funding text :
This article is based upon work from COST Action Nexus Linguarum, European network for Web-centred
linguistic data science, supported by COST (European Cooperation in Science and Technology). www.cost.eu.