Exploring the Impact of Modality and Speech Rate Manipulation in Voice Permission Requests—Limits of Applicability and Potential for Influencing Decision-Making
Auditory feedback; Design ethics; Synthetic speech; Decisions makings; Design considerations; Speech rates; User perceptions; Voice modality; Voice user interface; Human Factors and Ergonomics; Software; Education; Engineering (all); Human-Computer Interaction; Hardware and Architecture
Abstract :
[en] As voice-enabled technologies are becoming increasingly more prevalent, voice-enabled permission requests become a crucial topic of investigation. It is yet unclear how to appropriately inform users in voice user interfaces (VUIs) about data processing practices. To understand how modality (text vs. voice) and the speech rate of the voice can influence users’ perceptions and decisions to grant permission, we conducted two preregistered studies (N = 343 and N = 594) and one pre-study, including two listening tasks to design potentially deceptive voice patterns. We found that users can distinguish between different levels of intrusiveness in the voice modality. However, they are less likely to accept voice-based permissions, pointing to cognitive problems associated with them. Moreover, we found that speech rate manipulations of action verbs “Accept” and “Decline” shifted users’ decisions towards acceptance, making the effect less controllable than predicted. This work highlights implications and design considerations for future voice-enabled permission requests.
Disciplines :
Computer science
Author, co-author :
Leschanowsky, Anna ; Fraunhofer Institute for Integrated Circuits IIS, Erlangen, Germany
SERGEEVA, Anastasia ✱; University of Luxembourg > Interdisciplinary Centre for Security, Reliability and Trust (SNT) > IRiSC
Bauer, Judith ; Fraunhofer Institute for Integrated Circuits IIS, Erlangen, Germany
Vijapurapu, Sheetal; Fraunhofer Institute for Integrated Circuits IIS, Erlangen, Germany
DUBIEL, Mateusz ; University of Luxembourg > Faculty of Science, Technology and Medicine (FSTM) > Department of Computer Science (DCS)
✱ These authors have contributed equally to this work.
External co-authors :
yes
Language :
English
Title :
Exploring the Impact of Modality and Speech Rate Manipulation in Voice Permission Requests—Limits of Applicability and Potential for Influencing Decision-Making
Horizon Europe German Research Foundation Friedrich-Alexander-Universität Erlangen-Nürnberg European Commission
Funding text :
This research was partially supported by the Free State of Bavaria in the DSAI project, Germany [grant number RMF-SG20-3410-2-15-4 ], by the Fraunhofer-Zukunftsstiftung, Germany and by the European Union under Grant Agreement No. 101092861 . Views and opinions expressed are, however, those of the author(s) only and do not necessarily reflect those of the European Union or the European Commission. Neither the European Union nor the granting authority can be held responsible for them. The authors gratefully acknowledge the scientific support and HPC resources provided by the Erlangen National High Performance Computing Center (NHR@FAU) of the Friedrich-Alexander-Universit\u00E4t Erlangen-N\u00FCrnberg (FAU) under the NHR project b215dc. NHR funding is provided by federal and Bavarian state authorities. NHR@FAU hardware is partially funded by the German Research Foundation (DFG), Germany [grant number 440719683 ].
Abdi, N., Zhan, X., Ramokapane, K.M., Such, J., 2021. Privacy norms for smart home personal assistants. In: Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems. pp. 1–14.
Abrokwa, D., Das, S., Akgul, O., Mazurek, M.L., Comparing security and privacy attitudes among U.S. users of different smartphone and smart-speaker platforms. Seventeenth Symposium on Usable Privacy and Security, SOUPS 2021, 2021, USENIX Association, 139–158 URL: https://www.usenix.org/conference/soups2021/presentation/abrokwa.
Ahmad, I., Akter, T., Buher, Z., Farzan, R., Kapadia, A., Lee, A.J., Tangible privacy for smart voice assistants: Bystanders’ perceptions of physical device controls. Proc. ACM Hum. Comput. Interact., 6(CSCW2), 2022, 10.1145/3555089.
Alexa Developer, The new Alexa design guide helps developers design skills that keep users coming back for more. 2023 URL: https://developer.amazon.com/en-US/blogs/alexa/alexa-skills-kit/2023/03/alexa-design-guide-march-2023.
Altman, I., The Environment and Social Behavior: Privacy, Personal Space, Territory, and Crowding. 1975, Brooks/Cole Publishing Company.
Ammari, T., Kaye, J., Tsai, J.Y., Bentley, F., Music, search, and IoT: How people (really) use voice assistants. ACM Trans. Comput. Hum. Interact. (TOCHI) 26:3 (2019), 1–28.
Apple, W., Streeter, L.A., Krauss, R.M., Effects of pitch and speech rate on personal attributions. J. Pers. Soc. Psychol., 37(5), 1979, 715.
Astheimer, L.B., Sanders, L.D., Predictability affects early perceptual processing of word onsets in continuous speech. Neuropsychol. 49:12 (2011), 3512–3516.
Baddeley, A., Working memory. Sci. 255:5044 (1992), 556–559.
Bakhturina, E., Lavrukhin, V., Ginsburg, B., Zhang, Y., 2021. Hi-Fi Multi-Speaker English TTS Dataset. In: Proceedings of the Annual Conference of the International Speech Communication Association. Interspeech, Brno, Czech Republic.
Belin, P., Bestelmeyer, P.E., Latinus, M., Watson, R., Understanding voice perception. Br. J. Psychol. 102:4 (2011), 711–725.
Bem, S.L., Gender schema theory: A cognitive account of sex typing. Psychol Rev, 88(4), 1981, 354.
Bernritter, S.F., Verlegh, P.W., Smit, E.G., Why nonprofits are easier to endorse on social media: The roles of warmth and brand symbolism. J. Interact. Mark. 33:1 (2016), 27–42.
Berry, D.C., Butler, L.T., de Rosis, F., Evaluating a realistic agent in an advice-giving task. Int. J. Hum.-Comput. Stud. 63:3 (2005), 304–327, 10.1016/j.ijhcs.2005.03.006.
Beyer, H., Holtzblatt, K., Contextual Design: Defining Customer-Centered Systems. 1997, Morgan Kaufmann Publishers Inc, San Francisco, CA, USA.
Bhatia, J., Breaux, T.D., A data purpose case study of privacy policies. 2017 IEEE 25th International Requirements Engineering Conference, RE, 2017, 394–399, 10.1109/RE.2017.56.
Böhme, R., Köpsell, S., Trained to accept? a field experiment on consent dialogs. Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, CHI ’10, 2010, Association for Computing Machinery, New York, NY, USA, 2403–2406, 10.1145/1753326.1753689.
Bongard-Blanchy, K., Rossi, A., Rivas, S., Doublet, S., Koenig, V., Lenzini, G., 2021. ” I am Definitely Manipulated, Even When I am Aware of it. It's Ridiculous!”-Dark Patterns from the End-User Perspective. In: Designing Interactive Systems Conference 2021. pp. 763–776.
Bösch, C., Erb, B., Kargl, F., Kopp, H., Pfattheicher, S., Tales from the dark side: Privacy dark strategies and privacy dark patterns. Proc. Priv. Enhanc. Technol. 2016:4 (2016), 237–254.
Brüggemeier, B., Lalone, P., Perceptions and reactions to conversational privacy initiated by a conversational user interface. Comput. Speech Lang., 71, 2022, 101269.
Chattopadhyay, A., Dahl, D.W., Ritchie, R.J., Shahin, K.N., Hearing voices: The impact of announcer speech characteristics on consumer response to broadcast advertising. J. Consum. Psychol. 13:3 (2003), 198–204.
Cheng, L., Wilson, C., Liao, S., Young, J., Dong, D., Hu, H., 2020. Dangerous skills got certified: Measuring the trustworthiness of skill certification in voice personal assistant platforms. In: Proceedings of the 2020 ACM SIGSAC Conference on Computer and Communications Security. pp. 1699–1716.
Clark, J., Yallop, C., An Introduction to Phonetics and Phonology. Second Ed., 1996.
Cole, J., Mo, Y., Hasegawa-Johnson, M., Signal-based and expectation-based factors in the perception of prosodic prominence. Lab. Phonol. 1:2 (2010), 425–452.
Corley, M., Hartsuiker, R.J., Why um helps auditory word recognition: The temporal delay hypothesis. PLoS One, 6(5), 2011, e19792.
Cutler, A., Foss, D.J., On the role of sentence stress in sentence processing. Lang. Speech 20:1 (1977), 1–10.
Data Protection Working Party, Opinion 02/2013 on apps on smart devices. 2013.
De Conca, S., The present looks nothing like the jetsons: Deceptive design in virtual assistants and the protection of the rights of users. Comput. Law Secur. Rev., 51, 2023, 105866.
Desai, S., Dubiel, M., Leiva, L.A., 2024. Examining humanness as a metaphor to design voice user interfaces. In: Proceedings of the 6th ACM Conference on Conversational User Interfaces. pp. 1–15.
Dowding, S., Gutwin, C., Cockburn, A., User speech rates and preferences for system speech rates. Int. J. Hum.-Comput. Stud., 184, 2024, 103222.
Dubiel, M., Halvey, M., Gallegos, P.O., King, S., 2020. Persuasive synthetic speech: Voice perception and user behaviour. In: Proceedings of the 2nd Conference on Conversational User Interfaces. pp. 1–9.
Dubiel, M., Leiva, L.A., Bongard-Blanchy, K., Sergeeva, A., “Hey genie, you got me thinking about my menu choices!” Impact of proactive feedback on user perception and reflection in decision-making tasks. ACM Trans. Comput. Hum. Interact., 2024.
Dubiel, M., Sergeeva, A., Leiva, L.A., 2024b. Impact of Voice Fidelity on Decision Making: A Potential Dark Pattern?. In: Proceedings of the 29th International Conference on Intelligent User Interfaces. pp. 181–194.
Dula, E., Rosero, A., Phillips, E., Identifying dark patterns in social robot behavior. 2023 Systems and Information Engineering Design Symposium, SIEDS, 2023, IEEE, 7–12.
Edu, J., Ferrer-Aran, X., Such, J., Suarez-Tangil, G., 2022. Measuring Alexa skill privacy practices across three years. In: Proceedings of the ACM Web Conference 2022. pp. 670–680.
Elias, I., Zen, H., Shen, J., Zhang, Y., Jia, Y., Skerry-Ryan, R., Wu, Y., Parallel tacotron 2: A non-autoregressive neural TTS model with differentiable duration modeling. Proceedings of the Annual Conference of the International Speech Communication Association, Interspeech, 2021, 141–145, 10.21437/Interspeech.2021-1461.
Emami-Naeini, P., Dheenadhayalan, J., Agarwal, Y., Cranor, L.F., Which privacy and security attributes most impact consumers’ risk perception and willingness to purchase IoT devices?. 2021 IEEE Symposium on Security and Privacy, SP, 2021, IEEE, 519–536.
Ernst, C.-P.H., Herm-Stapelberg, N., The impact of gender stereotyping on the perceived likability of virtual assistants. AMCIS, 2020.
Eskimez, S.E., Wang, X., Thakker, M., Li, C., Tsai, C.-H., Xiao, Z., Yang, H., Zhu, Z., Tang, M., Tan, X., et al. E2 tts: Embarrassingly easy fully non-autoregressive zero-shot tts. 2024 IEEE Spoken Language Technology Workshop, SLT, 2024, IEEE, 682–689.
European Commission, Regulation (EU) 2016/679 of the European parliament and of the council of 27 April 2016 on the protection of natural persons with regard to the processing of personal data and on the free movement of such data, and repealing directive 95/46/EC (general data protection regulation) (text with EEA relevance). 2016 URL: https://eur-lex.europa.eu/eli/reg/2016/679/oj.
European Data Protection Board (EDPB), Guidelines 05/2020 on consent under regulation 2016/679. 2020.
European Data Protection Board (EDPB), Guidelines 03/2022 on deceptive design patterns in social media platform interfaces: how to recognise and avoid them. 2023.
European Parliament and Council of the European Union, Regulation (EU) 2022/1925 of the European parliament and of the council of 14 september 2022 on contestable and fair markets in the digital sector and amending directives (EU) 2019/1937 and (EU) 2020/1828 (digital markets act). 2022.
European Parliament and Council of the European Union, Regulation (EU) 2022/2065 of the European parliament and of the council of 19 october 2022 on a single market for digital services and amending directive 2000/31/EC (digital services act). 2022.
European Parliament and Council of the European Union, Regulation (EU) 2024/1689 of the European parliament and of the council of 13 June 2024 laying down harmonised rules on artificial intelligence and amending regulations (EC) no 300/2008, (EU) no 167/2013, (EU) no 168/2013, (EU) 2018/858, (EU) 2018/1139 and (EU) 2019/2144 and directives 2014/90/EU, (EU) 2016/797 and (EU) 2020/1828 (artificial intelligence act). 2024.
Finstad, K., Response interpolation and scale sensitivity: Evidence against 5-point scales. J. Usability Stud. 5:3 (2010), 104–110.
Freiberger, V., Fleig, A., Buchmann, E., ”You don't need a university degree to comprehend data protection this way”: LLM-powered interactive privacy policy assessment. Proceedings of the Extended Abstracts of the CHI Conference on Human Factors in Computing Systems CHI EA ’25, 2025, Association for Computing Machinery, New York, NY, USA, 10.1145/3706599.3719816.
Fruchter, N., Liccardi, I., 2018. Consumer attitudes towards privacy and security in home assistants. In: Extended Abstracts of the 2018 CHI Conference on Human Factors in Computing Systems. pp. 1–6.
Gaiser, F., Utz, S., Is hearing really believing? The importance of modality for perceived message credibility during information search with smart speakers. J. Media Psychol. Theor. Methods Appl., 2023.
Geipel, J., Hadjichristidis, C., Savadori, L., Keysar, B., Language modality influences risk perception: Innovations read well but sound even better. Risk Anal. 43:3 (2023), 558–570.
Goodman, K.L., Mayhorn, C.B., It's not what you say but how you say it: Examining the influence of perceived voice assistant gender and pitch on trust and reliance. Appl. Ergon., 106, 2023, 103864.
Gray, C.M., Chivukula, S.S., Lee, A., 2020. What Kind of Work Do” Asshole Designers” Create? Describing Properties of Ethical Concern on Reddit. In: Proceedings of the 2020 Acm Designing Interactive Systems Conference. pp. 61–73.
Gray, C.M., Kou, Y., Battles, B., Hoggatt, J., Toombs, A.L., 2018. The dark (patterns) side of UX design. In: Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems. pp. 1–14.
Gray, C.M., Santos, C., Bielova, N., Toth, M., Clifford, D., 2021. Dark patterns and the legal requirements of consent banners: An interaction criticism perspective. In: Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems. pp. 1–18.
Groß, T., Toward valid and reliable privacy concern scales: The example of iuipc-8. Human Factors in Privacy Research, 2023, Springer International Publishing Cham, 55–81.
Guerreiro, J., Loureiro, S.M.C., I am attracted to my cool smart assistant! Analyzing attachment-aversion in AI-human relationships. J. Bus. Res., 161, 2023, 113863.
Gunawan, J., Santos, C., Kamara, I., 2022. Redress for Dark Patterns Privacy Harms? A Case Study on Consent Interactions. In: Proceedings of the 2022 Symposium on Computer Science and Law. pp. 181–194.
Hall, J.A., Carter, J.D., Horgan, T.G., Gender differences in nonverbal communication of emotion. Gend. Emot. Soc. Psychol. Perspect., 2000, 97–117.
Hamilton, K., Shih, S.-I., Mohammed, S., The development and validation of the rational and intuitive decision styles scale. J. Pers. Assess. 98:5 (2016), 523–535.
Harkous, H., Fawaz, K., Shin, K.G., Aberer, K., 2016. PriBots: Conversational privacy with chatbots. In: Twelfth Symposium on Usable Privacy and Security. SOUPS 2016.
Holbrook, A.L., Krosnick, J.A., Moore, D., Tourangeau, R., Response order effects in dichotomous categorical questions presented orally: The impact of question and respondent attributes. Public Opin. Q. 71:3 (2007), 325–348, 10.1093/poq/nfm024.
Holmberg, E.B., Hillman, R.E., Perkell, J.S., Glottal airflow and transglottal air pressure measurements for male and female speakers in soft, normal, and loud voice. J. Acoust. Soc. Am. 84:2 (1988), 511–529.
Huang, R., Lam, M.W.Y., Wang, J., Su, D., Yu, D., Ren, Y., Zhao, Z., 2022. FastDiff: A Fast Conditional Diffusion Model for High-Quality Speech Synthesis. In: Raedt, L.D. (Ed.), Proceedings of the Thirty-First International Joint Conference on Artificial Intelligence. IJCAI, pp. 4157–4163.
International Telecommunication Union, Recommendation ITU-R BS.1534-3: Method for the subjective assessment of intermediate quality level of audio systems. 2015 URL: https://www.itu.int/rec/R-REC-BS.1534/en.
Ischen, C., Araujo, T.B., Voorveld, H.A., Van Noort, G., Smit, E.G., Is voice really persuasive? The influence of modality in virtual assistant interactions and two alternative explanations. Internet Res. 32:7 (2022), 402–425.
Joly, A., Nicolis, M., Peterova, E., Lombardi, A., Abbas, A., van Korlaar, A., Hussain, A., Sharma, P., Moinet, A., Lajszczak, M., Karanasou, P., Bonafonte, A., Drugman, T., Sokolova, E., 2023. Controllable Emphasis with zero data for text-to-speech. In: Proceedings of the ISCA Workshop on Speech Synthesis. SSW, Grenoble, France, pp. 113–119. http://dx.doi.org/10.21437/SSW.2023-18.
Ju, Z., Wang, Y., Shen, K., Tan, X., Xin, D., Yang, D., Liu, Y., Leng, Y., Song, K., Tang, S., Wu, Z., Qin, T., Li, X.-Y., Ye, W., Zhang, S., Bian, J., He, L., Li, J., Zhao, S., NaturalSpeech 3: zero-shot speech synthesis with factorized codec and diffusion models. Proceedings of the 41st International Conference on Machine Learning, ICML ’24, 2024, JMLR.org.
Jungers, M.K., Hupp, J.M., Speech priming: Evidence for rate persistence in unscripted speech. Lang. Cogn. Process. 24:4 (2009), 611–624.
Kang, H., Oh, J., Communication privacy management for smart speaker use: Integrating the role of privacy self-efficacy and the multidimensional view. New Media Soc. 25:5 (2023), 1153–1175, 10.1177/14614448211026611.
Karegar, F., Pettersson, J.S., Fischer-Hübner, S., The dilemma of user engagement in privacy notices: Effects of interaction modes and habituation on user attention. ACM Trans. Priv. Secur., 23(1), 2020, 10.1145/3372296.
Kawakita, J., The Original KJ Method. 1982, Kawakita Research Institute, Tokyo.
Kiesel, A., Wagener, A., Kunde, W., Hoffmann, J., Fallgatter, A.J., Stöcker, C., Unconscious manipulation of free choice in humans. Conscious. Cogn. 15:2 (2006), 397–408.
Kochanski, G., Grabe, E., Coleman, J., Rosner, B., Loudness predicts prominence: Fundamental frequency lends little. J. Acoust. Soc. Am. 118:2 (2005), 1038–1054.
Korff, S., Böhme, R., 2014. Too Much Choice: End-User Privacy Decisions in the Context of Choice Proliferation. In: 10th Symposium on Usable Privacy and Security. SOUPS 2014, pp. 69–87.
Krisam, C., Dietmann, H., Volkamer, M., Kulyk, O., 2021. Dark patterns in the wild: Review of cookie disclaimer designs on top 500 German websites. In: Proceedings of the 2021 European Symposium on Usable Security. pp. 1–8.
Krosnick, J.A., Alwin, D.F., An evaluation of a cognitive theory of response-order effects in survey measurement. Public Opin. Q. 51:2 (1987), 201–219.
Kyi, L., Ammanaghatta Shivakumar, S., Santos, C.T., Roesner, F., Zufall, F., Biega, A.J., 2023. Investigating deceptive design in GDPR's legitimate interest. In: Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems. pp. 1–16.
Lau, J., Zimmerman, B., Schaub, F., Alexa, are you listening? Privacy perceptions, concerns and privacy-seeking behaviors with smart speakers. Proc. ACM Hum. Comput. Interact. 2:CSCW (2018), 1–31.
Lentzsch, C., Shah, S.J., Andow, B., Degeling, M., Das, A., Enck, W., 2021. Hey Alexa, is this skill safe?: Taking a closer look at the Alexa skill ecosystem. In: Network and Distributed Systems Security (NDSS) Symposium.
Leschanowsky, A., Popp, B., Peters, N., 2023. Privacy Strategies for Conversational AI and their Influence on Users’ Perceptions and Decision-Making. In: Proceedings of the 2023 European Symposium on Usable Security. pp. 296–311.
Leschanowsky, A., Rech, S., Popp, B., Bäckström, T., Evaluating privacy, security, and trust perceptions in conversational AI: A systematic review. Comput. Hum. Behav., 159, 2024, 108344, 10.1016/j.chb.2024.108344.
Leschanowsky, A., Salamatjoo, F., Kolagar, Z., Popp, B., Expert-generated privacy Q&A dataset for conversational AI and user study insights. Proceedings of the Extended Abstracts of the CHI Conference on Human Factors in Computing Systems CHI EA ’25, 2025, Association for Computing Machinery, New York, NY, USA, 10.1145/3706599.3720014 URL: https://doi.org/10.1145/3706599.3720014.
Levitan, S.I., Maredia, A., Hirschberg, J., Acoustic-prosodic indicators of deception and trust in interview dialogues. Interspeech, 2018, 416–420.
Liao, S., Cheng, L., Cai, H., Guo, L., Hu, H., 2023. Skillscanner: Detecting policy-violating voice applications through static analysis at the development phase. In: Proceedings of the 2023 ACM SIGSAC Conference on Computer and Communications Security. pp. 2321–2335.
Liao, S., Wilson, C., Cheng, L., Hu, H., Deng, H., 2020. Measuring the effectiveness of privacy policies for voice assistant applications. In: Proceedings of the 36th Annual Computer Security Applications Conference. pp. 856–869.
Lin, V.Z., Parkin, S., Transferability of privacy-related behaviours to shared smart home assistant devices. 2020 7th International Conference on Internet of Things: Systems, Management and Security, IOTSMS, 2020, 1–8, 10.1109/IOTSMS52051.2020.9340199.
Luguri, J., Strahilevitz, L.J., Shining a light on dark patterns. J. Leg. Anal. 13:1 (2021), 43–109.
Malkin, N., Wagner, D., Egelman, S., 2022. Runtime permissions for privacy in proactive intelligent assistants. In: Eighteenth Symposium on Usable Privacy and Security. SOUPS 2022, pp. 633–651.
Masotina, M., Spagnolli, A., Transparency of privacy notices and contextualisation: effectively conveying information without words. Behav. Inf. Technol. 41:10 (2022), 2120–2150.
Mathur, A., Kshirsagar, M., Mayer, J., 2021. What makes a dark pattern. dark? Design attributes, normative considerations, and measurement methods. In: Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems. pp. 1–18.
Mazanec, N., Mccall, G.J., Sex factors and allocation of attention in observing persons. J. Psychol. 93:2 (1976), 175–180.
Mildner, T., Cooney, O., Meck, A.-M., Bartl, M., Savino, G.-L., Doyle, P.R., Garaialde, D., Clark, L., Sloan, J., Wenig, N., et al., 2024. Listening to the Voices: Describing Ethical Caveats of Conversational User Interfaces According to Experts and Frequent Users. In: Proceedings of the CHI Conference on Human Factors in Computing Systems. pp. 1–18.
Ming, L.J., Aziz, F.A., Sahari, B., A study on real-time auditory feedback technique in manipulation task. 2008 International Symposium on Information Technology, Vol. 1, 2008, IEEE, 1–6.
Molden, D.C., Understanding priming effects in social psychology: What is “social priming” and how does it occur?. Soc. Cogn. 32:Supplement (2014), 1–11.
Morel, V., Pardo, R., Sok: Three facets of privacy policies. Proceedings of the 19th Workshop on Privacy in the Electronic Society, WPES ’20, 2020, Association for Computing Machinery, New York, NY, USA, 41–56, 10.1145/3411497.3420216.
Mustafa, A., Pia, N., Fuchs, G., 2021. StyleMelGAN: An Efficient High-Fidelity Adversarial Vocoder with Temporal Adaptive Normalization. In: Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing. ICASSP, Toronto, Canada, pp. 6034–6038.
National Public Media, The smart audio report — national public media — nationalpublicmedia.com. 2022 https://www.nationalpublicmedia.com/insights/reports/smart-audio-report/. (Accessed 05 September 2023).
Nguyen, B., Cardinaux, F., Uhlich, S., Autotts: End-to-end text-to-speech synthesis through differentiable duration modeling. Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP, 2023, 1–5, 10.1109/ICASSP49357.2023.10095431.
Nowak, G.J., Phelps, J.E., Understanding privacy concerns: An assessment of consumers’ information-related knowledge and beliefs. J. Direct Mark. 6:4 (1992), 28–39.
Oleszkiewicz, A., Pisanski, K., Lachowicz-Tabaczek, K., Sorokowska, A., Voice-based assessments of trustworthiness, competence, and warmth in blind and sighted adults. Psychon. Bull. Rev. 24 (2017), 856–862.
Owens, K., Gunawan, J., Choffnes, D., Emami-Naeini, P., Kohno, T., Roesner, F., 2022. Exploring deceptive design patterns in voice interfaces. In: Proceedings of the 2022 European Symposium on Usable Security. pp. 64–78.
Pal, D., Arpnikanondt, C., Razzaque, M.A., Personal information disclosure via voice assistants: The personalization–Privacy paradox. SN Comput. Sci., 1(5), 2020, 10.1007/s42979-020-00287-9.
Pathiyan Cherumanal, S., Gadiraju, U., Spina, D., 2024. Everything we hear: Towards tackling misinformation in podcasts. In: Proceedings of the 26th International Conference on Multimodal Interaction. pp. 596–601.
Payne, B.K., Brown-Iannuzzi, J.L., Loersch, C., Replicable effects of primes on human behavior. J. Exp. Psychol. [Gen.], 145(10), 2016, 1269.
Pias, S.B.H., Huang, R., Williamson, D.S., Kim, M., Kapadia, A., 2024. The Impact of Perceived Tone, Age, and Gender on Voice Assistant Persuasiveness in the Context of Product Recommendations. In: Proceedings of the 6th ACM Conference on Conversational User Interfaces. pp. 1–15.
Potoglou, D., Dunkerley, F., Patil, S., Robinson, N., Public preferences for internet surveillance, data retention and privacy enhancing services: Evidence from a pan-European study. Comput. Hum. Behav. 75 (2017), 811–825.
Puts, D.A., Gaulin, S.J., Verdolini, K., Dominance and the evolution of sexual dimorphism in human voice pitch. Evol. Hum. Behav. 27:4 (2006), 283–296.
Puts, D.A., Hill, A.K., Bailey, D.H., Walker, R.S., Rendall, D., Wheatley, J.R., Welling, L.L., Dawood, K., Cárdenas, R., Burriss, R.P., et al. Sexual selection on male vocal fundamental frequency in humans and other anthropoids. Proc. R. Soc. Biol. Sci., 283(1829), 2016, 20152830.
Raitio, T., Li, J., Seshadri, S., 2022. Hierarchical Prosody Modeling and Control in Non-Autoregressive Parallel Neural TTS. In: Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing. ICASSP, Singapore, pp. 7587–7591. http://dx.doi.org/10.1109/ICASSP43922.2022.9746253.
Ren, Y., Hu, C., Tan, X., Qin, T., Zhao, S., Zhao, Z., Liu, T., 2021. FastSpeech 2: Fast and High-Quality End-to-End Text to Speech. In: Proceedings of the International Conference on Learning Representations. ICLR, virtual, Austria.
Renz, A., Neff, T., Baldauf, M., Maier, E., Authentication methods for voice services on smart speakers–a multi-method study on perceived security and ease of use. I-Com 22:1 (2023), 67–81.
Roca, I., Johnson, W., A Course in Phonology. 1999, Blackwell Publishers, Ltd., Oxford.
Rodero, E., Influence of speech rate and information density on recognition: The moderate dynamic mechanism. Media Psychol. 19:2 (2016), 224–242.
Rodero, E., Larrea, O., Rodríguez-de Dios, I., Lucas, I., The expressive balance effect: perception and physiological responses of prosody and gestures. J. Lang. Soc. Psychol. 41:6 (2022), 659–684.
Rzepka, C., Berger, B., Hess, T., Voice assistant vs. Chatbot–examining the fit between conversational agents’ interaction modalities and information search tasks. Inf. Syst. Front. 24:3 (2022), 839–856.
Sandhu, R., Dyson, B.J., Re-evaluating visual and auditory dominance through modality switching costs and congruency analyses. Acta Psychol. 140:2 (2012), 111–118.
Schaub, F., Balebako, R., Cranor, L.F., Designing effective privacy notices and controls. IEEE Internet Comput. 21:3 (2017), 70–77, 10.1109/MIC.2017.75.
Schaub, F., Balebako, R., Durity, A.L., Cranor, L.F., 2015. A design space for effective privacy notices. In: Eleventh Symposium on Usable Privacy and Security. SOUPS 2015, pp. 1–17.
Schild, C., Stern, J., Zettler, I., Linking men's voice pitch to actual and perceived trustworthiness across domains. Behav. Ecol. 31:1 (2020), 164–175.
Schirmer, A., Chiu, M.H., Lo, C., Feng, Y.-J., Penney, T.B., Angry, old, male–and trustworthy? How expressive and person voice characteristics shape listener trust. PLoS One, 15(5), 2020, e0232431.
Schoeffler, M., Bartoschek, S., Stöter, F.-R., Roess, M., Westphal, S., Edler, B., Herre, J., webMUSHRA—A comprehensive framework for web-based listening tests. 2018.
Schomakers, E.-M., Lidynia, C., Müllmann, D., Ziefle, M., Internet users’ perceptions of information sensitivity–insights from Germany. Int. J. Inf. Manage. 46 (2019), 142–150.
Schuetzler, R.M., Grimes, G.M., Giboney, J.S., Nunamaker, J.F., 2018. The influence of conversational agents on socially desirable responding. In: Proceedings of the 51st Hawaii International Conference on System Sciences. p. 283.
Seshadri, S., Raitio, T., Castellani, D., Li, J., 2022. Emphasis Control for Parallel Neural TTS. In: Proceedings of the Annual Conference of the International Speech Communication Association. Interspeech, Incheon, Korea, pp. 3378–3382. http://dx.doi.org/10.21437/Interspeech.2022-411.
Seymour, W., Abdi, N., Ramokapane, K.M., Edu, J., Suarez-Tangil, G., Such, J., Voice app developer experiences with Alexa and google assistant: Juggling risks, liability, and security. 33rd USENIX Security Symposium, USENIX Security 24, 2024, USENIX Association, Philadelphia, PA, 5035–5052.
Seymour, W., Binns, R., Slovak, P., Van Kleek, M., Shadbolt, N., Strangers in the room: Unpacking perceptions of ’smartness’ and related ethical concerns in the home. Proceedings of the 2020 ACM Designing Interactive Systems Conference, DIS ’20, 2020, Association for Computing Machinery, New York, NY, USA, 841–854, 10.1145/3357236.3395501.
Seymour, W., Cote, M., Such, J., Can you meaningfully consent in eight seconds? Identifying ethical issues with verbal consent for voice assistants. Proceedings of the 4th Conference on Conversational User Interfaces, CUI ’22, 2022, Association for Computing Machinery, New York, NY, USA, 10.1145/3543829.3544521.
Seymour, W., Cote, M., Such, J., Legal obligation and ethical best practice: Towards meaningful verbal consent for voice assistants. Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems, CHI ’23, 2023, Association for Computing Machinery, New York, NY, USA, 10.1145/3544548.3580967.
Shechtman, S., Fernandez, R., Haws, D., 2021. Supervised and unsupervised approaches for controlling narrow lexical focus in sequence-to-sequence speech synthesis. In: Proceedings of the IEEE Spoken Language Technology Workshop. SLT, Shenzhen, China, pp. 431–437. http://dx.doi.org/10.1109/SLT48900.2021.9383591.
Shibuya, T., Takida, Y., Mitsufuji, Y., BIGVSAN: Enhancing gan-based neural vocoders with slicing adversarial network. Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP, 2024, 10121–10125, 10.1109/ICASSP48485.2024.10446121.
Shih, F., Liccardi, I., Weitzner, D., 2015. Privacy tipping points in smartphones privacy preferences. In: Proceedings of the 33rd Annual ACM Conference on Human Factors in Computing Systems. pp. 807–816.
Simpson, A.P., Phonetic differences between male and female speech. Lang. Linguist. Compass 3:2 (2009), 621–640.
Smith, H.J., Dinev, T., Xu, H., Information privacy research: an interdisciplinary review. MIS Q., 2011, 989–1015.
Tsao, Y.-C., Weismer, G., Iqbal, K., Interspeaker variation in habitual speaking rate: Additional evidence. 2006.
Tuncer, R., Sergeeva, A., Bongard-Blanchy, K., Distler, V., Doublet, S., Koenig, V., Running out of time (rs): effects of scarcity cues on perceived task load, perceived benevolence and user experience on e-commerce sites. Behav. Inf. Technol., 2023, 1–19.
Utz, C., Degeling, M., Fahl, S., Schaub, F., Holz, T., 2019. (Un) informed consent: Studying GDPR consent notices in the field. In: Proceedings of the 2019 Acm Sigsac Conference on Computer and Communications Security. pp. 973–990.
Valoggia, P., Sergeeva, A., Rossi, A., Botes, M., Learning from the dark side about how (not) to engineer privacy: Analysis of dark patterns taxonomies from an ISO 29100 perspective. Proceedings of the 10th International Conference on Information Systems Security and Privacy - ICISSP, 2024, INSTICC. SciTePress, 774–784, 10.5220/0012393100003648.
Vance, A., Eargle, D., Jenkins, J.L., Kirwan, C.B., Anderson, B.B., The fog of warnings: How non-essential notifications blur with security warnings. Fifteenth Symposium on Usable Privacy and Security, SOUPS 2019, 2019, USENIX Association, Santa Clara, CA, 407–420.
Varghese, A.L., Nilsen, E.S., Is that how you should talk to her? Using appropriate prosody affects adults’, but not children's, judgments of communicators’ competence. J. Lang. Soc. Psychol. 39:5–6 (2020), 738–750.
Vixen Labs, AI consumer index 2023. 2023.
Vukovic, J., Jones, B.C., Feinberg, D.R., DeBruine, L.M., Smith, F.G., Welling, L.L., Little, A.C., Variation in perceptions of physical dominance and trustworthiness predicts individual differences in the effect of relationship context on women's preferences for masculine pitch in men's voices. Br. J. Psychol. 102:1 (2011), 37–48.
Wambsganss, T., Zierau, N., Söllner, M., Käser, T., Koedinger, K.R., Leimeister, J.M., Designing conversational evaluation tools: A comparison of text and voice modalities to improve response quality in course evaluations. Proc. ACM Hum. Comput Interact. 6:CSCW2 (2022), 1–27.
Wang, Y., Leon, P.G., Scott, K., Chen, X., Acquisti, A., Cranor, L.F., Privacy nudges for social media: an exploratory facebook study. Proceedings of the 22nd International Conference on World Wide Web WWW ’13 Companion, 2013, Association for Computing Machinery, New York, NY, USA, 763–770, 10.1145/2487788.2488038.
Westin, A.F., Privacy and Freedom. 1967, Atheneum, New York.
Wottrich, V.M., van Reijmersdal, E.A., Smit, E.G., The privacy trade-off for mobile app downloads: The roles of app value, intrusiveness, and privacy concerns. Decis. Support Syst. 106 (2018), 44–52.
Yeasmin, F., Das, S., Bäckström, T., 2020. Privacy analysis of voice user interfaces. In: Conference of Open Innovations Association, FRUCT. Vol. 6.
Zalkow, F., Sani, P., Fast, M., Bauer, J., Joshaghani, M., Kayyar, K., Habets, E.A.P., Dittmar, C., 2023. The AudioLabs System for the Blizzard Challenge 2023. In: Proceedings of the Blizzard Challenge Workshop. Grenoble, France, pp. 63–68.