References of "Hosseini Kivanani, Nina 50042626"
     in
Bookmark and Share    
Full Text
Peer Reviewed
See detailThe Prosody of Cheering in Sport Events
Zygis, Marzena; Wesolek, Sarah; Hosseini Kivanani, Nina UL et al

in Proc. Interspeech 2022 (2022, September)

Motivational speaking usually conveys a highly emotional message and its purpose is to invite action. The goal of this paper is to investigate the prosodic realization of one particular type of cheering ... [more ▼]

Motivational speaking usually conveys a highly emotional message and its purpose is to invite action. The goal of this paper is to investigate the prosodic realization of one particular type of cheering, namely inciting cheering for single addressees in sport events (here, long-distance running), using the name of that person. 31 native speakers of German took part in the experiment. They were asked to cheer up an individual marathon runner in a sporting event represented by video by producing his or her name (1-5 syllables long). For reasons of comparison, the participants also produced the same names in isolation and carrier sentences. Our results reveal that speakers use different strategies to meet their motivational communicative goals: while some speakers produced the runners’ names by dividing them into syllables, others pronounced the names as quickly as possible putting more emphasis on the first syllable. A few speakers followed a mixed strategy. Contrary to our expectations, it was not the intensity that mostly contributes to the differences between the different speaking styles (cheering vs. neutral), at least in the methods we were using. Rather, participants employed higher fundamental frequency and longer duration when cheering for marathon runners. [less ▲]

Detailed reference viewed: 29 (4 UL)
Full Text
Peer Reviewed
See detailIRRMA: An Image Recommender Robot Meeting Assistant
Alcaraz, Benoît UL; Hosseini Kivanani, Nina UL; Najjar, Amro

in IRRMA: An Image Recommender Robot Meeting Assistant (2022, July)

The number of people who attend virtual meetings has increased as a result of COVID-19. In this paper, we present a system that consists of an expressive humanoid social robot called QTRobot, and a ... [more ▼]

The number of people who attend virtual meetings has increased as a result of COVID-19. In this paper, we present a system that consists of an expressive humanoid social robot called QTRobot, and a recommender system that employs natural language processing techniques to recommend images related to the content of the presenter’s speech to the audience in real time. This is achieved utilising the QTRobot’s platform capabilities (microphone, computation power, and Wi-Fi). [less ▲]

Detailed reference viewed: 35 (2 UL)
See detailA comparative study of automatic classifiers to recognize speakers based on fricatives
Hosseini Kivanani, Nina UL; Asadi, Homa; Schommer, Christoph UL et al

Poster (2022, July)

Speakers’ voices are highly individual and for this reason speakers can be identified based on their voice. Nevertheless, voices are often more variable within the same speaker than they are between ... [more ▼]

Speakers’ voices are highly individual and for this reason speakers can be identified based on their voice. Nevertheless, voices are often more variable within the same speaker than they are between speakers, which makes it difficult for humans and machines to differentiate between speakers (Hansen, J. H., & Hasan, T., 2015). To date, various machine learning methods have been developed to recognize speakers based on the acoustic characteristics of their speech; however, not all of them have proven equally effective in speaker identification, and depending on the obtained techniques, the system achieves a different result. Here, different machine learning classifiers have been applied to identify the best classification model (i.e., Naïve Bayes (NB), support vector machines (SVM), random forests (RF), & k-nearest neighbors (KNN)) for categorizing 4 speaking styles based on the segment types (voiceless fricatives) considering acoustic features of center of gravity, standard deviation, and skewness. We used a dataset consisting of speech samples from 7 native Persian subjects speaking in 4 different speaking styles: read, spontaneous, clear, and child-directed speech. The results revealed that the best performing model to predict the speakers based on the segment type was RF model with an accuracy of 81,3%, followed by SVM (76.3%), NB (75.4%), and KNN (74%) (Table 1). Our results showed that the RF performed the best for voiceless fricatives /f/, /s/, and / ʃ / which may indicate that these segments are much more speaker-specific than others (Gordon et al., 2002), and the model performance was low for the voiceless fricatives of /h/ and /x/. Performance can be seen in the confusion matrix (Figure 1), which produced high precision and recall values (above 80%) for /f/, /s/ and / ʃ / (Table 2). We found that the model performance improved when the data related to clear speaking style; the information in individual speakers (i.e., voiceless fricatives) are more distinguishable in clear style than other styles (Table 1). [less ▲]

Detailed reference viewed: 42 (13 UL)
Full Text
Peer Reviewed
See detailExperiments of ASR-based mispronunciation detection for children and adult English learners
Hosseini Kivanani, Nina UL; Gretter, Roberto; Matassoni, Marco et al

in Hosseini Kivanani, Nina; Gretter, Roberto; Matassoni, Marco (Eds.) et al BNAIC/BeneLearn 2021 (2021, November)

Pronunciation is one of the fundamentals of language learning, and it is considered a primary factor of spoken language when it comes to an understanding and being understood by others. The persistent ... [more ▼]

Pronunciation is one of the fundamentals of language learning, and it is considered a primary factor of spoken language when it comes to an understanding and being understood by others. The persistent presence of high error rates in speech recognition domains resulting from mispronunciations motivates us to find alternative techniques for handling mispronunciations. In this study, we develop a mispronunciation assessment system that checks the pronunciation of non-native English speakers, identifies the commonly mispronounced phonemes of Italian learners of English, and presents an evaluation of the non-native pronunciation observed in phonetically annotated speech corpora. In this work, to detect mispronunciations, we used a phone-based ASR implemented using Kaldi. We used two non-native English labeled corpora; (i) a corpus of Italian adults contains 5,867 utterances from 46 speakers, and (ii) a corpus of Italian children consists of 5,268 utterances from 78 children. Our results show that the selected error model can discriminate correct sounds from incorrect sounds in both native and non-native speech, and therefore can be used to detect pronunciation errors in nonnative speech. The phone error rates show improvement in using the error language model. Furthermore, the ASR system shows better accuracy after applying the error model on our selected corpora. [less ▲]

Detailed reference viewed: 27 (2 UL)