Luxembourgish language; POS-Tagging; Topic Modeling; Sentiment Analysis; Text Preparation; XML-Database
Résumé :
[en] Despite some recent work, the ongoing research for the processing of Luxembourgish is still largely in its infancy. While a rich variety of linguistic processing tools exist, especially for English, these software tools offer little scope for the Luxembourgish language. LuNa (a Tool for Luxembourgish National Corpus) is an Open Toolbox that allows researchers to annotate a text corpus written in Luxembourgish language and to build/query an annotated corpus. The aim of the paper is to demonstrate the components of the system and its usage for Machine Learning applications like Topic Modelling and Sentiment Detection. Overall, LuNa bases on a XML-database to store the data and to define the XML scheme, it offers a Graphical User Interface (GUI) for a linguistic data preparation such as tokenization, Part-Of-Speech tagging, and morphological analysis -- just to name a few.
Disciplines :
Langues & linguistique Sciences informatiques
Auteur, co-auteur :
SIRAJZADE, Joshgun ; University of Luxembourg > Faculty of Science, Technology and Communication (FSTC) > Computer Science and Communications Research Unit (CSC)
SCHOMMER, Christoph ; University of Luxembourg > Faculty of Science, Technology and Communication (FSTC) > Computer Science and Communications Research Unit (CSC)
Co-auteurs externes :
no
Langue du document :
Anglais
Titre :
The LuNa Open Toolbox for the Luxembourgish Language
Date de publication/diffusion :
2019
Nom de la manifestation :
19th Industrial Conference on Data Mining, ICDM 2019
Lieu de la manifestation :
New York, Etats-Unis
Date de la manifestation :
from 17-07-2019 to 21-07-2019
Manifestation à portée :
International
Titre de l'ouvrage principal :
Advances in Data Mining, Applications and Theoretical Aspects, Poster Proceedings 2019