Reference : A comparative study of automatic classifiers to recognize speakers based on fricatives
Scientific congresses, symposiums and conference proceedings : Poster
Engineering, computing & technology : Computer science
http://hdl.handle.net/10993/51712
A comparative study of automatic classifiers to recognize speakers based on fricatives
English
Hosseini Kivanani, Nina mailto [University of Luxembourg > Faculty of Science, Technology and Medicine (FSTM) > Department of Computer Science (DCS) >]
Asadi, Homa mailto [University of Isfahan, University of Zurich]
Schommer, Christoph mailto [University of Luxembourg > Faculty of Science, Technology and Medicine (FSTM) > Department of Computer Science (DCS) >]
Volker, Dellwo mailto [University of Zurich]
Jul-2022
No
International
1st Interdisciplinary Conference on Voice Identity (VoiceID): Perception, Production, and Computational Approaches
from 04-07-2022 to 06-07-2022
Zurich
Switerzland
[en] Speakers’ voices are highly individual and for this reason speakers can be identified based on
their voice. Nevertheless, voices are often more variable within the same speaker than they are
between speakers, which makes it difficult for humans and machines to differentiate between
speakers (Hansen, J. H., & Hasan, T., 2015). To date, various machine learning methods have
been developed to recognize speakers based on the acoustic characteristics of their speech;
however, not all of them have proven equally effective in speaker identification, and depending
on the obtained techniques, the system achieves a different result. Here, different machine
learning classifiers have been applied to identify the best classification model (i.e., Naïve Bayes
(NB), support vector machines (SVM), random forests (RF), & k-nearest neighbors (KNN))
for categorizing 4 speaking styles based on the segment types (voiceless fricatives) considering
acoustic features of center of gravity, standard deviation, and skewness. We used a dataset
consisting of speech samples from 7 native Persian subjects speaking in 4 different speaking
styles: read, spontaneous, clear, and child-directed speech. The results revealed that the best
performing model to predict the speakers based on the segment type was RF model with an
accuracy of 81,3%, followed by SVM (76.3%), NB (75.4%), and KNN (74%) (Table 1). Our
results showed that the RF performed the best for voiceless fricatives /f/, /s/, and / ʃ / which
may indicate that these segments are much more speaker-specific than others (Gordon et al.,
2002), and the model performance was low for the voiceless fricatives of /h/ and /x/.
Performance can be seen in the confusion matrix (Figure 1), which produced high precision and
recall values (above 80%) for /f/, /s/ and / ʃ / (Table 2). We found that the model performance
improved when the data related to clear speaking style; the information in individual speakers
(i.e., voiceless fricatives) are more distinguishable in clear style than other styles (Table 1).
Researchers ; Professionals ; Students
http://hdl.handle.net/10993/51712

There is no file associated with this reference.

Bookmark and Share SFX Query

All documents in ORBilu are protected by a user license.