Thèse de doctorat (Mémoires et thèses)
Detection of Sentiment in Luxembourgish User Comments
GIERSCHEK, Daniela
2022
 

Documents


Texte intégral
PhD_Dissertation_Daniela_Gierschek.pdf
Postprint Auteur (2.52 MB)
Télécharger

Tous les documents dans ORBilu sont protégés par une licence d'utilisation.

Envoyer vers



Détails



Mots-clés :
Computational Linguistics; Luxembourgish; Linguistics; Sentiment
Résumé :
[en] Sentiment is all around us in everyday life. It can be found in blog posts, social media comments, text messages and many other places where people express themselves. Sentiment analysis is the task of automatically detecting those sentiments, attitudes or opinions in written text. In this research, the first sentiment analysis solution for the low-resource language, Luxembourgish, is conducted using a large corpus of user comments published on the RTL Luxembourg website www.rtl.lu. Various resources were created for this purpose to set the foundation for further sentiment research in Luxembourgish. A Luxembourgish sentiment lexicon and an annotation tool were built as external resources that can be used for collecting and enlarging training data for sentiment analysis tasks. Additionally, a corpus of mainly sentences of user comments was annotated with negative, neutral and positive labels. This corpus was furthermore automatically translated to English and German. Afterwards, diverse text representations such as word2vec, tf-idf and one-hot encoding were used on the three versions of the corpus of labeled sentences for training different machine learning models. Furthermore, one part of the experimental setup leveraged linguistic features for the classification process in order to study their impact on sentiment expressions. By following such a broad strategy, this thesis not only sets the basis for sentiment analysis with Luxembourgish texts but also intends to give recommendations for conducting sentiment detection research for other low-resource languages. It is demonstrated that creating new resources for a low-resource language is an intensive task and should be carefully planned in order to outperform working with translations where the target language is a high-resource language such as English and German.
Disciplines :
Sciences informatiques
Arts & sciences humaines: Multidisciplinaire, généralités & autres
Langues & linguistique
Auteur, co-auteur :
GIERSCHEK, Daniela ;  University of Luxembourg > Faculty of Humanities, Education and Social Sciences (FHSE)
Langue du document :
Anglais
Titre :
Detection of Sentiment in Luxembourgish User Comments
Date de soutenance :
25 février 2022
Nombre de pages :
152
Institution :
Unilu - University of Luxembourg, Esch-sur-Alzette, Luxembourg
Intitulé du diplôme :
Docteur en Sciences du Langage
Promoteur :
Président du jury :
Membre du jury :
Plank, Barbara
Kralj Novak, Petra
SCHOMMER, Christoph  
Disponible sur ORBilu :
depuis le 09 mars 2022

Statistiques


Nombre de vues
351 (dont 24 Unilu)
Nombre de téléchargements
565 (dont 12 Unilu)

Bibliographie


Publications similaires



Contacter ORBilu