A comparative study of automatic classifiers to recognize speakers based on fricatives

HOSSEINI KIVANANI, Nina; Asadi, Homa; SCHOMMER, Christoph; Volker, Dellwo

Pas de texte intégral

Communication poster (Colloques, congrès, conférences scientifiques et actes)

A comparative study of automatic classifiers to recognize speakers based on fricatives

HOSSEINI KIVANANI, Nina; Asadi, Homa; SCHOMMER, Christoph et al.

2022 • 1st Interdisciplinary Conference on Voice Identity (VoiceID): Perception, Production, and Computational Approaches

Permalien
https://hdl.handle.net/10993/51712

Documents (0)Envoyer vers Détails Statistiques Bibliographie Publications similaires

Documents

Texte intégral

Aucun document disponible.

Envoyer vers

RIS BibTex APA Chicago Permalink X Linkedin

Détails

Résumé :

[en] Speakers’ voices are highly individual and for this reason speakers can be identified based on their voice. Nevertheless, voices are often more variable within the same speaker than they are between speakers, which makes it difficult for humans and machines to differentiate between speakers (Hansen, J. H., & Hasan, T., 2015). To date, various machine learning methods have been developed to recognize speakers based on the acoustic characteristics of their speech; however, not all of them have proven equally effective in speaker identification, and depending on the obtained techniques, the system achieves a different result. Here, different machine learning classifiers have been applied to identify the best classification model (i.e., Naïve Bayes (NB), support vector machines (SVM), random forests (RF), & k-nearest neighbors (KNN)) for categorizing 4 speaking styles based on the segment types (voiceless fricatives) considering acoustic features of center of gravity, standard deviation, and skewness. We used a dataset consisting of speech samples from 7 native Persian subjects speaking in 4 different speaking styles: read, spontaneous, clear, and child-directed speech. The results revealed that the best performing model to predict the speakers based on the segment type was RF model with an accuracy of 81,3%, followed by SVM (76.3%), NB (75.4%), and KNN (74%) (Table 1). Our results showed that the RF performed the best for voiceless fricatives /f/, /s/, and / ʃ / which may indicate that these segments are much more speaker-specific than others (Gordon et al., 2002), and the model performance was low for the voiceless fricatives of /h/ and /x/. Performance can be seen in the confusion matrix (Figure 1), which produced high precision and recall values (above 80%) for /f/, /s/ and / ʃ / (Table 2). We found that the model performance improved when the data related to clear speaking style; the information in individual speakers (i.e., voiceless fricatives) are more distinguishable in clear style than other styles (Table 1).

Disciplines :

Sciences informatiques

Auteur, co-auteur :

HOSSEINI KIVANANI, Nina ; University of Luxembourg > Faculty of Science, Technology and Medicine (FSTM) > Department of Computer Science (DCS)

Asadi, Homa; University of Isfahan, University of Zurich

SCHOMMER, Christoph ; University of Luxembourg > Faculty of Science, Technology and Medicine (FSTM) > Department of Computer Science (DCS)

Volker, Dellwo; University of Zurich

Co-auteurs externes :

yes

Langue du document :

Anglais

Titre :

A comparative study of automatic classifiers to recognize speakers based on fricatives

Date de publication/diffusion :

juillet 2022

Nom de la manifestation :

1st Interdisciplinary Conference on Voice Identity (VoiceID): Perception, Production, and Computational Approaches

Lieu de la manifestation :

Zurich, Suisse

Date de la manifestation :

from 04-07-2022 to 06-07-2022

Manifestation à portée :

International

Disponible sur ORBilu :

depuis le 19 juillet 2022

Statistiques

Nombre de vues

261 (dont 43 Unilu)

Nombre de téléchargements

0 (dont 0 Unilu)

Voir plus de statistiques