Abstract :
[en] Assessment is an essential part of education, both for teachers who assess their students as well as learners who auto-evaluate themselves. A popular type of assessment questions are multiple-choice questions (MCQ), as they can be automatically graded and can cover a wide range of learning items. However, the creation of high quality MCQ items is nontrivial. With the advent of Generative Pre-trained Transformer (GPT), considerable effort has been recently made regarding Automatic Question Generation (AQG). While metrics have been applied to evaluate the linguistic quality, an evaluation of generated questions according to the best practices for MCQ creation has been missing so far. In this paper, we propose an analysis of the quality of automatically generated MCQs from 3 different GPT-based services. After producing 150 MCQs in the domain of computer science, we analyse them according to common multiple-choice item writing guidelines and annotate them with identified docimological issues. The dataset of annotated MCQs is available in Moodle XML format. We discuss the different flaws and propose solutions for AQG service developers.
Scopus citations®
without self-citations
2