References of "Sirajzade, Joshgun 50003103"
     in
Bookmark and Share    
Full Text
Peer Reviewed
See detailDeep Mining Covid-19 Literature
Sirajzade, Joshgun UL; Bouvry, Pascal UL; Schommer, Christoph UL

in Applied Informatics, 5th International Conference, ICAI 2022, Arequipa, Peru, October 27–29, 2022, Proceedings (2022)

In this paper we investigate how scientific and medical papers about Covid-19 can be effectively mined. For this purpose we use the CORD19 dataset which is a huge collection of all papers published about ... [more ▼]

In this paper we investigate how scientific and medical papers about Covid-19 can be effectively mined. For this purpose we use the CORD19 dataset which is a huge collection of all papers published about and around the SARS-CoV2 virus and the pandemic it caused. We discuss how classical text mining algorithms like Latent Semantic Analysis (LSA) or its modern version Latent Drichlet Allocation (LDA) can be used for this purpose and also touch more modern variant of these algorithms like word2vec which came with deep learning wave and show their advantages and disadvantages each. We finish the paper with showing some topic examples from the corpus and answer questions such as which topics are the most prominent for the corpus or how many percentage of the corpus is dedicated to them. We also give a discussion of how topics around RNA research in connection with Covid-19 can be examined. [less ▲]

Detailed reference viewed: 39 (3 UL)
Full Text
Peer Reviewed
See detailComponent Analysis of Adjectives in Luxembourgish for Detecting Sentiments
Sirajzade, Joshgun UL; Gierschek, Daniela UL; Schommer, Christoph UL

in Beermann, Dorothee; Besacier, Laurent; Sakti, Sakriani (Eds.) et al Proceedings of the LREC 2020 1st Joint SLTU and CCURL Workshop(SLTU-CCURL 2020) (2020, May)

The aim of this paper is to investigate the role of Luxembourgish adjectives in expressing sentiments in user comments written at the web presence of rtl.lu (RTL is the abbreviation for Radio Television ... [more ▼]

The aim of this paper is to investigate the role of Luxembourgish adjectives in expressing sentiments in user comments written at the web presence of rtl.lu (RTL is the abbreviation for Radio Television Lëtzebuerg). Alongside many textual features or representations, adjectives could be used in order to detect sentiment, even on a sentence or comment level. In fact, they are also by themselves one of the best ways to describe a sentiment, despite the fact that other word classes such as nouns, verbs, adverbs or conjunctions can also be utilized for this purpose. The empirical part of this study focuses on a list of adjectives that were extracted from an annotated corpus. The corpus contains the part of speech tags of individual words and sentiment annotation on the adjective, sentence, and comment level. Suffixes of Luxembourgish adjectives like -esch, -eg, -lech, -al, -el, -iv, -ent, -los, -bar and the prefix on- were explicitly investigated, especially by paying attention to their role in regards to building a model by applying classical machine learning techniques. We also considered the interaction of adjectives with other grammatical means, especially other part of speeches, e.g. negations, which can completely reverse the meaning, thus the sentiment of an utterance. [less ▲]

Detailed reference viewed: 148 (21 UL)
Full Text
Peer Reviewed
See detailAn Annotation Framework for Luxembourgish Sentiment Analysis
Sirajzade, Joshgun UL; Gierschek, Daniela UL; Schommer, Christoph UL

in Besacier, Laurent; Sakti, Sakriani; Soria, Claudia (Eds.) et al Proceedings of the LREC 2020 1st Joint SLTU and CCURL Workshop (SLTU-CCURL 2020) (2020, May)

The aim of this paper is to present a framework developed for crowdsourcing sentiment annotation for the low-resource language Luxembourgish. Our tool is easily accessible through a web interface and ... [more ▼]

The aim of this paper is to present a framework developed for crowdsourcing sentiment annotation for the low-resource language Luxembourgish. Our tool is easily accessible through a web interface and facilitates sentence-level annotation of several annotators in parallel. In the heart of our framework is an XML database, which serves as central part linking several components. The corpus in the database consists of news articles and user comments. One of the components is LuNa, a tool for linguistic preprocessing of the data set. It tokenizes the text, splits it into sentences and assigns POS-tags to the tokens. After that, the preprocessed text is stored in XML format into the database. The Sentiment Annotation Tool, which is a browser-based tool, then enables the annotation of split sentences from the database. The Sentiment Engine, a separate module, is trained with this material in order to annotate the whole data set and analyze the sentiment of the comments over time and in relationship to the news articles. The gained knowledge can again be used to improve the sentiment classification on the one hand and on the other hand to understand the sentiment phenomenon from the linguistic point of view. [less ▲]

Detailed reference viewed: 153 (29 UL)
Full Text
See detailThe LuNa Open Toolbox for the Luxembourgish Language
Sirajzade, Joshgun UL; Schommer, Christoph UL

in Perner, Petra (Ed.) Advances in Data Mining, Applications and Theoretical Aspects, Poster Proceedings 2019 (2019)

Despite some recent work, the ongoing research for the processing of Luxembourgish is still largely in its infancy. While a rich variety of linguistic processing tools exist, especially for English, these ... [more ▼]

Despite some recent work, the ongoing research for the processing of Luxembourgish is still largely in its infancy. While a rich variety of linguistic processing tools exist, especially for English, these software tools offer little scope for the Luxembourgish language. LuNa (a Tool for Luxembourgish National Corpus) is an Open Toolbox that allows researchers to annotate a text corpus written in Luxembourgish language and to build/query an annotated corpus. The aim of the paper is to demonstrate the components of the system and its usage for Machine Learning applications like Topic Modelling and Sentiment Detection. Overall, LuNa bases on a XML-database to store the data and to define the XML scheme, it offers a Graphical User Interface (GUI) for a linguistic data preparation such as tokenization, Part-Of-Speech tagging, and morphological analysis -- just to name a few. [less ▲]

Detailed reference viewed: 360 (26 UL)
Full Text
See detailA Dynamic Associative Memory for Distant Reading
Kamlovskaya, Ekaterina UL; Schommer, Christoph UL; Sirajzade, Joshgun UL

in International Conference on Artificial Intelligence Humanities, Book of Abstracts (2018, August 16)

Detailed reference viewed: 147 (36 UL)
Full Text
See detailMind and Language. AI in an Example of Similar Patterns of Luxembourgish Language
Sirajzade, Joshgun UL; Schommer, Christoph UL

in International Conference on Artificial Intelligence Humanities, Book of Abstracts (2018, August 16)

Detailed reference viewed: 104 (7 UL)
Full Text
Peer Reviewed
See detailKorpusbasierte Untersuchung der Wortbildungsaffixe im Luxemburgischen. Technische Herausforderungen und linguistische Analyse am Beispiel der Produktivität
Sirajzade, Joshgun UL

in Zeitschrift für Wortbildung (2018), 2(1),

This article is a report about compiling a corpus of Luxembourgish for investigation of word formation. First it gives an example for benefits of using a corpus with annotations in investigation of ... [more ▼]

This article is a report about compiling a corpus of Luxembourgish for investigation of word formation. First it gives an example for benefits of using a corpus with annotations in investigation of productivity of some selected word formation affixes of Luxembourgish. Then it describes how this can be achieved from a technical point of view. [less ▲]

Detailed reference viewed: 233 (52 UL)
See detailCompiling Tools and Resources for Studying of Luxemburgish Language and beyond
Sirajzade, Joshgun UL

Scientific Conference (2016, June)

Detailed reference viewed: 106 (18 UL)
See detailText[ge]schichten. Herausforderungen textgenetischen Edierens bei Arthur Schnitzler
Burch, Thomas; Büdenbender, Stefan; Fink, Kristina et al

in Krüger, Katharina; Mengaldo, Elisabetta; Schumacher, Eckhard (Eds.) Textgenese und digitales Edieren, Wolfgang Koeppens "Jugend" im Kontext der Editionsphilologie (2016)

Detailed reference viewed: 138 (2 UL)
See detailDas luxemburgischsprachige Oeuvre von Michel Rodange (1827-1876). Editionsphilologische und korpuslinguistische Analyse
Sirajzade, Joshgun UL

Book published by Universität Trier (2015)

The current work focuses on theoretical and practical aspects of the analysis and publication of important literary texts using the methods of digital humanities, as well as corpus and computational ... [more ▼]

The current work focuses on theoretical and practical aspects of the analysis and publication of important literary texts using the methods of digital humanities, as well as corpus and computational linguistics. The oeuvre of Michel Rodange in the Luxembourgish language provides the basic material for this study. This includes the works ‘Renert oder de Fuuss am Frack an a Maansgréisst‘ - ca. 40.000 Tokens, ‘Dem Léiweckerche säi Lidd‘ – ca. 5.000 Tokens, ‘Dem Grof Sigfrid seng Goldkuemer‘ – ca. 15.000 Tokens and two poems – ca. 500 Tokens. On the empirical level the work involves compiling a corpus with text critical and linguistic annotations and its presentation as a web portal. A very interesting interdependency arises at this point between theory and practice. It is possible to use the created annotations in order to investigate the oeuvre from a philological point of view on the one hand; on the other hand one can use this knowledge in the development of tools, which can create similar or new annotations and improve their correctness. The text critical annotations consist of reading variants, corrections and a word glossary, whereas the linguistic annotations are of an orthographical, morphological (also including word classes and lemmata) and phraseological nature. The annotations are codes in XML. The first step in working with one’s own corpus is its digitalization. In the case of manuscripts this occurred by means of manual transcription, but the prints were digitized with the help of OCR-software. At this stage the texts were already well organized and enriched with the initial annotations and metadata. At first sight less important but still useful information such as page numbers, line breaks, chapters etc. were marked. However in compiling a corpus from an important literary work the conservation of text critical annotations is of special significance. In order to create such annotations the theoretical principles of textual criticism first needed to be discussed. Then the most important phenomena, such as text fragments, corrections, deletions etc., were documented. For poorly readable areas new interpretations were proposed and those from other editions were compared and discussed. The Text Encoding Initiative (TEI) offers a number of XML-elements to organize such annotations. It was possible to limit much manual work by using tools such as TUSTEP or oXygen as well as scripting languages, e.g. Perl. These provide a powerful technique of “regular expressions”, which can automate the search and replacement work at an abstract level. The next and very useful step is tokenization. At this level text critical annotations almost overlap with linguistic annotations. A fuzzy border was defined for tokens and they were marked with their own XML-element. It is essential for further investigation to consider and study the language of the author. For this purpose special attention was paid to poetic styles in the Luxembourgish language of the 19th century, literary genres, and the orthography and spelling of the author. On the empirical level, the analysis of the corpus from a linguistic point of view with the help of corpus linguistics methods, e.g. extraction of multi word units, forms the scientific basis for later digital presentation of the oeuvre. For this purpose a number of tools for analyzing the oeuvre quantitatively as well as qualitatively were designed and developed. The output of these tools is discussed in the current work. Here theories from classical linguistics e.g. morphology, phraseology and corpus linguistics e.g. POS-Taggers, concordances and collocational analysis, were discussed. All in all the following programs were implemented and described: Tokenizer, Frequency List, POS-Trainer, POS-Tagger, Lemmatizer and further programs for word formation and phraseological analysis. The program for morphological analysis and Lemmatizer are basically rule-based, whereas most of the other programs work by means of statistics. Hidden Markov Models, which derive from probabilistic theory, underlie the assignment of part of speech tags to words. For phraseological analysis many current statistical models such as z-score, t-score, mutual information, chi-square-test and fisher’s exact test were implemented and tested. Chapter 3.4 is dedicated to the output of the program for morphological analysis and discusses word formation in the oeuvre of Michel Rodange. Chapter 3.6 focuses on the interpretation of the output for collocational- phraseological analysis. As a result of this investigation it became apparent that many automatically identified phraseological units in Michel Rodange’s oeuvre are not only a part of Luxembourgish language and culture, but also found in other European cultures. [less ▲]

Detailed reference viewed: 93 (5 UL)
See detailDie Erstellung sprach- und literaturwissenschaftlicher Tools für das Historisch-kritische Michel-Rodange-Portal
Sirajzade, Joshgun UL

in Gilles, Peter; Wagner, Mélanie (Eds.) Linguistische und soziolinguistische Bausteine der Luxemburgistik (2011)

Detailed reference viewed: 72 (5 UL)
Full Text
See detailQədim türk dilində cins anlayışının morfoloji üsulla ifadəsinə dair
Sirajzade, Joshgun UL

in Süleymanlı, Mübariz (Ed.) Mədəniyyət dünyası. Elmi-nəzəri məcmuə (2004)

Detailed reference viewed: 106 (12 UL)