Reference : Detection of Sentiment in Luxembourgish User Comments
Dissertations and theses : Doctoral thesis
Engineering, computing & technology : Computer science
Arts & humanities : Languages & linguistics
Arts & humanities : Multidisciplinary, general & others
Detection of Sentiment in Luxembourgish User Comments
Gierschek, Daniela mailto [University of Luxembourg > Faculty of Humanities, Education and Social Sciences (FHSE) > >]
University of Luxembourg, ​Esch-sur-Alzette, ​​Luxembourg
Docteur en Sciences du Langage
Gilles, Peter mailto
Purschke, Christoph mailto
Plank, Barbara mailto
Kralj Novak, Petra mailto
Schommer, Christoph mailto
[en] Computational Linguistics ; Luxembourgish ; Linguistics ; Sentiment
[en] Sentiment is all around us in everyday life. It can be found in blog posts, social media comments, text messages and many other places where people express themselves. Sentiment analysis is the task of automatically detecting those sentiments, attitudes or opinions in written text. In this research, the first sentiment analysis solution for the low-resource language, Luxembourgish, is conducted using a large corpus of user comments published on the RTL Luxembourg website Various resources were created for this purpose to set the foundation for further sentiment research in Luxembourgish.
A Luxembourgish sentiment lexicon and an annotation tool were built as external resources that can be used for collecting and enlarging training data for sentiment analysis tasks. Additionally, a corpus of mainly sentences of user comments was annotated with negative, neutral and positive labels. This corpus was furthermore automatically translated to English and German.
Afterwards, diverse text representations such as word2vec, tf-idf and one-hot encoding were used on the three versions of the corpus of labeled sentences for training different machine learning models. Furthermore, one part of the experimental setup leveraged linguistic features for the classification process in order to study their impact on sentiment expressions.
By following such a broad strategy, this thesis not only sets the basis for sentiment analysis with Luxembourgish texts but also intends to give recommendations for conducting sentiment detection research for other low-resource languages. It is demonstrated that creating new resources for a low-resource language is an intensive task and should be carefully planned in order to outperform working with translations where the target language is a high-resource language such as English and German.

File(s) associated to this reference

Fulltext file(s):

Open access
PhD_Dissertation_Daniela_Gierschek.pdfAuthor postprint2.46 MBView/Open

Bookmark and Share SFX Query

All documents in ORBilu are protected by a user license.