Docimological Quality Analysis of LLM-Generated Multiple Choice Questions in Computer Science and Medicine

GREVISSE, Christian; PAVLOU, Maria Angeliki; SCHNEIDER, Jochen

doi:10.1007/s42979-024-02963-6

Request a copy

Article (Scientific journals)

Docimological Quality Analysis of LLM-Generated Multiple Choice Questions in Computer Science and Medicine

GREVISSE, Christian; PAVLOU, Maria Angeliki; SCHNEIDER, Jochen

2024 • In SN Computer Science, 5

Peer Reviewed verified by ORBi

Permalink
https://hdl.handle.net/10993/61345

DOI
10.1007/s42979-024-02963-6

Files (1)Send to Details Statistics Bibliography Similar publications

Files

Full Text

_s42979-024-02963-6.pdf

Publisher postprint (2.42 MB)

Request a copy

All documents in ORBilu are protected by a user license.

Send to

RIS BibTex APA Chicago Permalink X Linkedin

Details

Keywords :

Automatic question generation; Large language models; Multiple choice questions; Generative pre-trained transformer; Moodle

Abstract :

[en] Assessment is an essential part of education, both for teachers who assess their students as well as learners who may evaluate themselves. Multiple-choice questions (MCQ) are one of the most popular types of knowledge assessment, e.g., in medical education, as they can be automatically graded and can cover a wide range of learning items. However, the creation of high-quality MCQ items is a time-consuming task. The recent advent of Large Language Models (LLM), such as Generative Pre-trained Transformer (GPT), caused a new momentum for automatic question generation solutions. Still, evaluating generated questions according to the best practices for MCQ item writing is needed to ensure docimological quality. In this article, we propose an analysis of the quality of LLM-generated MCQs. We employ zero-shot approaches in two domains, namely computer science and medicine. In the former, we make use of 3 GPT-based services to generate MCQs. In the latter, we developed a plugin for the Moodle learning management system that generates MCQs based on learning material. We compare the generated MCQs against common multiple-choice item writing guidelines. Among the major challenges, we determined that while LLMs are certainly useful in generating MCQs more efficiently, they sometimes create broad items with ambiguous keys or implausible distractors. Human oversight is also necessary to ensure instructional alignment between generated items and course contents. Finally, we propose solutions for AQG developers.

Disciplines :

Computer science
Human health sciences: Multidisciplinary, general & others

Author, co-author :

GREVISSE, Christian ; University of Luxembourg > Faculty of Science, Technology and Medicine (FSTM) > Department of Life Sciences and Medicine (DLSM) > Medical Education

PAVLOU, Maria Angeliki ; University of Luxembourg > Faculty of Science, Technology and Medicine (FSTM) > Department of Life Sciences and Medicine (DLSM) > Medical Education

SCHNEIDER, Jochen ; University of Luxembourg > Luxembourg Centre for Systems Biomedicine (LCSB) > Medical Translational Research

External co-authors :

Language :

English

Title :

Docimological Quality Analysis of LLM-Generated Multiple Choice Questions in Computer Science and Medicine

Publication date :

10 June 2024

Journal title :

SN Computer Science

ISSN :

2662-995X

eISSN :

2661-8907

Publisher :

Springer Nature, Singapore, Singapore

Special issue title :

Advances in Applied Informatics

Volume :

Peer reviewed :

Peer Reviewed verified by ORBi

Available on ORBilu :

since 10 June 2024

Statistics

Number of views

229 (31 by Unilu)

Number of downloads

8 (7 by Unilu)

More statistics

Scopus citations^®

Scopus citations^®
without self-citations

OpenAlex citations

Bibliography

B.S. Bloom Taxonomy of educational objectives: the classification of educational goals 1956 Boston Allyn and Bacon
G.E. Miller The assessment of clinical skills/competence/performance Acad Med 1990 65 63 67 10.1097/00001888-199009000-00045
Bertrand C. et al. In: Pelaccia T (ed) Choisir un outil d’évaluationComment (mieux) former et évaluer les étudiants en médecine et en sciences de la santé? De Boeck Supérieur. 2016. pp. 357–370
Cheung BHH et al. ChatGPT versus human in generating medical graduate exam multiple choice questions-A multinational prospective study. In: Hong Kong SAR, Singapore, Ireland, and the United Kingdom. PLOS ONE 2023;18:1–12.
Doughty J. et al. A comparative study of AI-generated (GPT-4) and human-crafted MCQs in programming education. In: Herbert N, Seton C, editors. Proceedings of the 26th Australasian Computing Education Conference, ACE ’24. New York:Association for Computing Machinery. 2024. p. 114–123
Indran IR, N G, Paramanathan P, Mustafa N. Twelve tips to leverage AI for efficient and effective medical question generation: a guide for educators using Chat GPT. Medical Teacher 2023;2:1–6.
M. Zuckerman et al. ChatGPT for assessment writing Med Teach 2023 45 1224 1227 10.1080/0142159X.2023.2249239
Dijkstra R, Genç Z, Kayal S, Kamps J. Reading comprehension quiz generation using generative pre-trained transformers. In: Sosnovsky SA, Brusilovsky P, Lan AS, editors, Proceedings of the Fourth International Workshop on Intelligent Textbooks 2022 co-located with 23d International Conference on Artificial Intelligence in Education (AIED 2022), Durham, UK, July 27, 2022, Vol. 3192 of CEUR Workshop Proceedings, 4–17 (CEUR-WS.org, 2022). https://ceur-ws.org/Vol-3192/itb22_p1_full5439.pdf.
N. Mulla P. Gharpure Automatic question generation: a review of methodologies, datasets, evaluation metrics, and applications Progr. Artif. Intell. 2023 12 1 32 10.1007/s13748-023-00295-9
Bandiera G, Sherbino J, Frank JR. The CanMEDS assessment tools handbook: an introductory guide to assessment methods for the CanMEDS competencies. Royal College of Physicians and Surgeons of Canada, 2006.
T.M. Haladyna S.M. Downing M.C. Rodriguez A review of multiple-choice item-writing guidelines for classroom assessment Appl Measur Educ 2002 15 309 333 10.1207/S15324818AME1503_5
Shank P. Write better multiple-choice questions to assess learning: measure what matters - evidence-informed tactics for multiple-choice questions. Learning Peaks LLC, 2021.
D. DiBattista J.-A. Sinnige-Egger G. Fortuna The, “None of the Above” Option in Multiple-Choice Testing: An Experimental Study J Exp Educ 2014 82 168 183 10.1080/00220973.2013.795127
M. Tavakol R. Dennick Post-examination analysis of objective tests Med Teach 2011 33 447 458 10.3109/0142159X.2011.564682
Cortés JA, Vega JA, Schotborg DC, Caicedo JC. Education platform with dynamic questions using cloud computing services. In: Solano A, Ordoñez H, editors. Advances in computing. Cham:Springer International Publishing, 2017. p. 387–400
MoodleDocs. Calculated question type. https://docs.moodle.org/402/en/Calculated_question_type. Accessed: 2023-07-10.
A.P. Kumar A. Nayak M.S. Chaitanya K. Ghosh A novel framework for the generation of multiple choice question stems using semantic and machine-learning techniques Int J Artif Intell Educ. 2023 2 2
Gilal AR et al. Question guru: an automated multiple-choice question generation system. In: Al-Sharafi MA, Al-Emran M, Al-Kabi MN, Shaalan K, editors Proceedings of the 2nd International Conference on Emerging Technologies and Intelligent Systems. Cham: Springer International Publishing 2023. p. 501–514
E. Gabajiwala P. Mehta R. Singh R. Koshy P.K. Singh S.T. Wierzchoń J.K. Chhabra S. Tanwar Quiz maker: automatic quiz generation from text using NLP Futuristic trends in networks and computing technologies 2022 Singapore Springer Nature Singapore 523 533 10.1007/978-981-19-5037-7_37
R. Goyal P. Kumar V.P. Singh Automated question and answer generation from texts using text-to-text transformers Arab J Sci Eng 2023 2 2
S. Kumar A. Chauhan C.P. Kumar P.P. Roy A. Agarwal T. Li P. Krishna Reddy R. Uday Kiran Learning enhancement using question-answer generation for e-book using contrastive fine-tuned T5 Big data analytics 2022 Cham Springer Nature Switzerland 68 87 10.1007/978-3-031-24094-2_5
C. Srihari S. Sunagar R.K. Kamat K.S. Raghavendra M. Meleet S.M. Thampi J. Mukhopadhyay M. Paprzycki K.-C. Li Question and answer generation from text using transformers International Symposium on Intelligent Informatics 2023 Singapore Springer Nature Singapore 201 210 10.1007/978-981-19-8094-7_15
K. Vachev et al. et al. M. Hagen et al. et al. Leaf: multiple-choice question generation Advances in information retrieval 2022 Cham Springer International Publishing 321 328 10.1007/978-3-030-99739-7_41
Laupichler MC, Rother JF, Grunwald Kadow IC, Ahmadi S, Raupach T. Large language models in medical education: comparing ChatGPT- to human-generated exam questions. Acad Med 2023.
C. Grévisse H. Florez M. Leon H. Florez M. Leon Comparative quality analysis of GPT-based multiple choice question generation Applied informatics 2023 Cham Springer Nature Switzerland 435 447
Lewis P et al. In: Larochelle H, Ranzato M, Hadsell R, Balcan M, Lin H (eds) Retrieval-augmented generation for knowledge-intensive NLP tasks. Advances in neural information processing systems, Vol. 33, 9459–9474 (Curran Associates, Inc., 2020). https://proceedings.neurips.cc/paper_files/paper/2020/file/6b493230205f780e1bc26945df7481e5-Paper.pdf.
Y. Touissi G. Hjiej A. Hajjioui A. Ibrahimi M. Fourtassi Does developing multiple-choice questions improve medical students’ learning? A systematic review Med Educ 2022 27 2005505
Tran A et al. (2023) In: Hammond T, Hogan H (eds) Generating multiple choice questions for computing courses using large language models. 2023 IEEE Frontiers in Education Conference (FIE), 1–8.
A. Bongir V. Attar R. Janardhanan S.M. Thampi et al. et al. S.M. Thampi et al. et al. Automated quiz generator Intelligent systems technologies and applications 2018 Cham Springer International Publishing 174 188 10.1007/978-3-319-68385-0_15
R. Manrique C. Grévisse O. Mariño S. Rothkugel R. Ichise et al. et al. R. Ichise et al. et al. Knowledge graph-based core concept identification in learning resources Semantic technology 2018 Cham Springer International Publishing 36 51