Mobile text entry; Evaluation Methodologies; Transcription tasks; Large Language Models; Evaluation methodologies; Evaluation methodology; Evaluation methods; Eye-tracking studies; Language model; Large language model; Text entry; Transcription task; Human-Computer Interaction; Computer Networks and Communications; Computer Vision and Pattern Recognition; Software
Abstract :
[en] We explore a novel transcription task in mobile text entry research, presenting stimuli within LLM-generated conversational contexts to improve participant engagement and phrase memorability. We conducted two studies: an eye-tracking study examining participants' attention when presented with conversational contexts alongside stimuli, and an experiment comparing LLM-generated and human-generated prompt-response pairs in transcription tasks, involving both high and low memorability stimuli. Key findings reveal that presenting conversational contexts improves recall for low memorability phrases and results in fewer uncorrected errors during transcription. No significant effects were observed on other basic text entry metrics, or participant subjective appraisals of engagement with the novel task, suggesting it can be used safely as an alternative to the traditional transcription task. We discuss the potential of LLMs in improving text entry evaluation methods, including generating diverse linguistic styles, emotionally loaded contexts, and even simulating entire evaluation processes. Our study highlights the need for systematic approaches to generate and evaluate LLM outputs for research purposes, and for proposing new metrics and evaluation methods.
Disciplines :
Computer science
Author, co-author :
Komninos, Andreas ; University of Patras, Rio, Greece
Feit, Anna Maria ; University of Saarland, Saarbrücken, Germany
LEIVA, Luis A. ; University of Luxembourg > Faculty of Science, Technology and Medicine (FSTM) > Department of Computer Science (DCS)
Lehmann, Florian; University of Bayreuth, Bayreuth, Germany
Simou, Ioulia; University of Patras, Rio, Greece
Minas, Dimosthenis; University of Patras, Rio, Greece
Fotopoulos, Aggelos; University of Patras, Rio, Greece
Xenos, Michalis; University of Patras, Rio, Greece
External co-authors :
yes
Language :
English
Title :
An LLM-driven Transcription Task for Mobile Text Entry Studies
Publication date :
December 2024
Event name :
Proceedings of the International Conference on Mobile and Ubiquitous Multimedia
Event place :
Stockholm, Swe
Event date :
01-12-2024 => 04-12-2024
Main work title :
Proceedings of MUM 2024 the 23rd International Conference on Mobile and Ubiquitous Multimedia
Jacob Abbott, Jofish Kaye, and James Clawson. 2022. Identifying an Aurally Distinct Phrase Set for Text Entry Techniques. In Proceedings of the 2022 CHI Conference on Human Factors in Computing Systems (CHI'22). Association for Computing Machinery, New York, NY, USA, 1-13. https://doi.org/10.1145/ 3491102.3501897
Arwa I. Alhussain and Aqil M. Azmi. 2021. Automatic Story Generation: A Survey of Approaches. ACM Comput. Surv. 54, 5 (May 2021), 103:1-103:38. https://doi.org/10.1145/3453156
Charles Baah, Irene Govender, and Prabhakar Rontala Subramaniam. 2023. Exploring the Role of Gamification in Motivating Students to Learn. Cogent Education 10, 1 (Dec. 2023), 2210045. https://doi.org/10.1080/2331186X.2023.2210045
Florian Bemmann and Daniel Buschek. 2020. LanguageLogger: A Mobile Keyboard Application for Studying Language Use in Everyday Text Communication in the Wild. Proc. ACM Hum.-Comput. Interact. 4, EICS (June 2020), 84:1-84:24. https://doi.org/10.1145/3397872
Ralf Biedert, Andreas Dengel, Georg Buscher, and Arman Vartan. 2012. Reading and Estimating Gaze on Smart Phones. In Proceedings of the Symposium on Eye Tracking Research and Applications (ETRA'12). Association for Computing Machinery, New York, NY, USA, 385-388. https://doi.org/10.1145/2168556.2168643
Daniel Buschek, Benjamin Bisinger, and Florian Alt. 2018. ResearchIME: A Mobile Keyboard Application for Studying Free Typing Behaviour in the Wild. In Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems (CHI'18). ACM, New York, NY, USA, 255:1-255:14. https://doi.org/10.1145/ 3173574.3173829
Daniel Cer, Yinfei Yang, Sheng-yi Kong, Nan Hua, Nicole Limtiaco, Rhomni St John, Noah Constant, Mario Guajardo-Cespedes, Steve Yuan, Chris Tar, Yun- Hsuan Sung, Brian Strope, and Ray Kurzweil. 2018. Universal Sentence Encoder. https://doi.org/10.48550/arXiv.1803.11175 arXiv:1803.11175 [cs]
Chi-Min Chan, Weize Chen, Yusheng Su, Jianxuan Yu, Wei Xue, Shanghang Zhang, Jie Fu, and Zhiyuan Liu. 2023. ChatEval: Towards Better LLM-based Evaluators through Multi-Agent Debate. https://doi.org/10.48550/arXiv.2308. 07201 arXiv:2308.07201 [cs]
Anita Crescenzi and Lan Li. 2022. Assessing Realism in Simulated Work Tasks. In ACM SIGIR Conference on Human Information Interaction and Retrieval. ACM, Regensburg Germany, 266-271. https://doi.org/10.1145/3498366.3505831
Pratibha A. Dabholkar. 1994. Incorporating Choice into an Attitudinal Framework: Analyzing Models of Mental Comparison Processes. Journal of Consumer Research 21, 1 (1994), 100-118. jstor:2489743 https://www.jstor.org/stable/2489743
Michael Desmond, Zahra Ashktorab, Qian Pan, Casey Dugan, and James M. Johnson. 2024. EvaluLLM: LLM Assisted Evaluation of Generative Outputs. In Companion Proceedings of the 29th International Conference on Intelligent User Interfaces (IUI'24 Companion). Association for Computing Machinery, New York, NY, USA, 30-32. https://doi.org/10.1145/3640544.3645216
Abigail Evans and Jacob Wobbrock. 2012. Taming Wild Behavior: The Input Observer for Obtaining Text Entry and Mouse Pointing Measures from Everyday Computer Use. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (CHI'12). Association for Computing Machinery, New York, NY, USA, 1947-1956. https://doi.org/10.1145/2207676.2208338
Marc Franco-Salvador and Luis A. Leiva. 2018. Multilingual Phrase Sampling for Text Entry Evaluations. International Journal of Human-Computer Studies 113 (May 2018), 15-31. https://doi.org/10.1016/j.ijhcs.2018.01.006
Mackenzie E. Hannum and Christopher T. Simons. 2020. Development of the Engagement Questionnaire (EQ): A Tool to Measure Panelist Engagement during Sensory and Consumer Evaluations. Food Quality and Preference 81 (April 2020), 103840. https://doi.org/10.1016/j.foodqual.2019.103840
Niels Henze, Enrico Rukzio, and Susanne Boll. 2012. Observational and Experimental Investigation of Typing Behaviour Using Virtual Keyboards for Mobile Devices. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (CHI'12). ACM, New York, NY, USA, 2659-2668. https: //doi.org/10.1145/2207676.2208658
Xinhui Jiang, Yang Li, Jussi P.P. Jokinen, Viet Ba Hirvola, Antti Oulasvirta, and Xiangshi Ren. 2020. How We Type: Eye and Finger Movement Strategies in Mobile Typing. In Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems (CHI'20). Association for Computing Machinery, New York, NY, USA, 1-14. https://doi.org/10.1145/3313831.3376711
Noona Kiuru, Birgit Spinath, Anna-Leena Clem, Kenneth Eklund, Timo Ahonen, and Riikka Hirvonen. 2020. The Dynamics of Motivation, Emotion, and Task Performance in Simulated Achievement Situations. Learning and Individual Differences 80 (May 2020), 101873. https://doi.org/10.1016/j.lindif.2020.101873
Andreas Komninos, Mark Dunlop, Kyriakos Katsaris, and John Garofalakis. 2018. A Glimpse of Mobile Text Entry Errors and Corrective Behaviour in the Wild. In Proceedings of the 20th International Conference on Human-Computer Interaction with Mobile Devices and Services Adjunct (MobileHCI'18). Association for Computing Machinery, New York, NY, USA, 221-228. https://doi.org/10.1145/ 3236112.3236143
Per Ola Kristensson and Keith Vertanen. 2012. Performance Comparisons of Phrase Sets and Presentation Styles for Text Entry Evaluations. In Proceedings of the 2012 ACM International Conference on Intelligent User Interfaces (IUI'12). Association for Computing Machinery, New York, NY, USA, 29-32. https://doi. org/10.1145/2166966.2166972
Luis A. Leiva and Germán Sanchis-Trilles. 2014. Representatively Memorable: Sampling the Right Phrase Set to Get the Text Entry Experiment Right. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (CHI'14). Association for Computing Machinery, Toronto, Ontario, Canada, 1709-1712. https://doi.org/10.1145/2556288.2557024
Letitia Lew, Truc Nguyen, Solomon Messing, and Sean Westwood. 2011. Of Course IWouldn't Do That in Real Life: Advancing the Arguments for Increasing Realism in HCI Experiments. In CHI'11 Extended Abstracts on Human Factors in Computing Systems (CHI EA'11). Association for Computing Machinery, New York, NY, USA, 419-428. https://doi.org/10.1145/1979742.1979621
I. Scott MacKenzie and R. William Soukoreff. 2003. Phrase Sets for Evaluating Text Entry Techniques. In CHI'03 Extended Abstracts on Human Factors in Computing Systems (CHI EA'03). Association for Computing Machinery, New York, NY, USA, 754-755. https://doi.org/10.1145/765891.765971
Bhuvanashree Murugadoss, Christian Poelitz, Ian Drosos, Vu Le, Nick McKenna, Carina Suzana Negreanu, Chris Parnin, and Advait Sarkar. 2024. Evaluating the Evaluator: Measuring LLMs' Adherence to Task Evaluation Instructions. https://doi.org/10.48550/arXiv.2408.08781 arXiv:2408.08781 [cs]
Emma Nicol, Andreas Komninos, and Mark D. Dunlop. 2016. A Participatory Design and Formal Study Investigation into Mobile Text Entry for Older Adults. International Journal of Mobile Human Computer Interaction 8 (May 2016), 20-46. https://doi.org/10.4018/ijmhci.2016040102.oa
Hugo Nicolau, Kyle Montague, Tiago Guerreiro, André Rodrigues, and Vicki L. Hanson. 2017. Investigating Laboratory and Everyday Typing Performance of Blind Users. ACM Trans. Access. Comput. 10, 1 (March 2017), 4:1-4:26. https: //doi.org/10.1145/3046785
Heather L. O'Brien, Paul Cairns, and Mark Hall. 2018. A Practical Approach to Measuring User Engagement with the Refined User Engagement Scale (UES) and New UES Short Form. International Journal of Human-Computer Studies 112 (April 2018), 28-39. https://doi.org/10.1016/j.ijhcs.2018.01.004
Rita Orji, Derek Reilly, Kiemute Oyibo, and Fidelia A. Orji. 2019. Deconstructing Persuasiveness of Strategies in Behaviour Change Systems Using the ARCS Model of Motivation. Behaviour & Information Technology 38, 4 (April 2019), 319-335. https://doi.org/10.1080/0144929X.2018.1520302
Kseniia Palin, Anna Maria Feit, Sunjun Kim, Per Ola Kristensson, and Antti Oulasvirta. 2019. How Do People Type on Mobile Devices? Observations from a Study with 37, 000 Volunteers. In Proceedings of the 21st International Conference on Human-Computer Interaction with Mobile Devices and Services (MobileHCI'19). Association for Computing Machinery, Taipei, Taiwan, 1-12. https://doi.org/10. 1145/3338286.3340120
Christopher S. Pan, Richard L. Shell, and Lawrence M. Schleifer. 1994. Performance Variability as an Indicator of Fatigue and Boredom Effects in a VDT Data-entry Task. International Journal of Human-Computer Interaction 6, 1 (Jan. 1994), 37-45. https://doi.org/10.1080/10447319409526082
Felix Putze, Maik Schünemann, Tanja Schultz, and Wolfgang Stuerzlinger. 2017. Automatic Classification of Auto-Correction Errors in Predictive Text Entry Based on EEG and Context Information. In Proceedings of the 19th ACM International Conference on Multimodal Interaction (ICMI'17). Association for Computing Machinery, Glasgow, UK, 137-145. https://doi.org/10.1145/3136755.3136784
Lucie M. Ramjan. 2011. Contextualism Adds Realism: Nursing Students' Perceptions of and Performance in Numeracy Skills Tests. Nurse Education Today 31, 8 (Nov. 2011), e16-e21. https://doi.org/10.1016/j.nedt.2010.11.006
Shyam Reyal, Shumin Zhai, and Per Ola Kristensson. 2015. Performance and User Experience of Touchscreen and Gesture Keyboards in a Lab Setting and in the Wild. In Proceedings of the 33rd Annual ACM Conference on Human Factors in Computing Systems (CHI'15). Association for Computing Machinery, New York, NY, USA, 679-688. https://doi.org/10.1145/2702123.2702597
Ronald E Robertson, Alexandra Olteanu, Fernando Diaz, Milad Shokouhi, and Peter Bailey. 2021. "I Can't Reply with That": Characterizing Problematic Email Reply Suggestions. In Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems (CHI'21). Association for Computing Machinery, New York, NY, USA, 1-18. https://doi.org/10.1145/3411764.3445557
André Rodrigues, Hugo Nicolau, André Santos, Diogo Branco, Jay Rainey, David Verweij, Jan David Smeddinck, Kyle Montague, and Tiago Guerreiro. 2022. Investigating the Tradeoffs of Everyday Text-Entry Collection Methods. In Proceedings of the 2022 CHI Conference on Human Factors in Computing Systems (CHI'22). Association for Computing Machinery, New York, NY, USA, 1-15. https://doi.org/10.1145/3491102.3501908
Bastian Schildbach and Enrico Rukzio. 2010. Investigating Selection and Reading Performance on a Mobile Phone While Walking. In Proceedings of the 12th International Conference on Human Computer Interaction with Mobile Devices and Services. ACM, Lisbon Portugal, 93-102. https://doi.org/10.1145/1851600.1851619
Keith Vertanen and Per Ola Kristensson. 2011. A Versatile Dataset for Text Entry Evaluations Based on Genuine Mobile Emails. In Proceedings of the 13th International Conference on Human Computer Interaction with Mobile Devices and Services (MobileHCI'11). Association for Computing Machinery, New York, NY, USA, 295-298. https://doi.org/10.1145/2037373.2037418
Keith Vertanen and Per Ola Kristensson. 2014. Complementing Text Entry Evaluations with a Composition Task. ACM Trans. Comput.-Hum. Interact. 21, 2 (Feb. 2014), 8:1-8:33. https://doi.org/10.1145/2555691
Jeffrey D. Wall and Merrill Warkentin. 2019. Perceived Argument Quality's Effect on Threat and Coping Appraisals in Fear Appeals: An Experiment and Exploration of Realism Check Heuristics. Information & Management 56, 8 (Dec. 2019), 103157. https://doi.org/10.1016/j.im.2019.03.002
Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C. Schmidt. 2023. A Prompt Pattern Catalog to Enhance Prompt Engineering with ChatGPT. https://doi.org/ 10.48550/arXiv.2302.11382 arXiv:2302.11382 [cs]
Javad Zare and Ali Derakhshan. 2024. Task Engagement in Second Language Acquisition: A Questionnaire Development and Validation Study. Journal of Multilingual and Multicultural Development 0, 0 (2024), 1-17. https://doi.org/10. 1080/01434632.2024.2306166