TY - GEN
T1 - YMCQ
T2 - 26th International Conference on Artificial Intelligence in Education, AIED 2025
AU - Dutulescu, Andreea
AU - Ruseti, Stefan
AU - Iorga, Denis
AU - Dascalu, Mihai
AU - McNamara, Danielle S.
N1 - Publisher Copyright:
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2025.
PY - 2025
Y1 - 2025
N2 - Automated multiple-choice question (MCQ) generation is valuable for scalable assessment and enhanced learning experiences. However, existing MCQ generation methods face challenges in ensuring plausible distractors and maintaining answer consistency. This paper introduces a method for MCQ generation that integrates reasoning-based explanations for both correct answers and distractors, leveraging open-source language models finetuned on publicly available datasets. Our approach addresses these issues with 3 major improvements. First, over 300k questions from public datasets were augmented with synthetically generated reasoning explanations. Second, we fine-tune a Large Language Model (LLM) with reasoning-based explanations to condition the generation while accounting for correct reasoning and possible misconceptions. Third, we introduce a multi-step filtering pipeline to ensure the validity of the question and the diversity of the generated distractors. This work argues for the effectiveness of reasoning-enhanced finetuning in improving MCQ generation quality while maintaining accessibility and cost efficiency. We release all resources, including synthetically augmented questions, training code, and the best model as open-source.
AB - Automated multiple-choice question (MCQ) generation is valuable for scalable assessment and enhanced learning experiences. However, existing MCQ generation methods face challenges in ensuring plausible distractors and maintaining answer consistency. This paper introduces a method for MCQ generation that integrates reasoning-based explanations for both correct answers and distractors, leveraging open-source language models finetuned on publicly available datasets. Our approach addresses these issues with 3 major improvements. First, over 300k questions from public datasets were augmented with synthetically generated reasoning explanations. Second, we fine-tune a Large Language Model (LLM) with reasoning-based explanations to condition the generation while accounting for correct reasoning and possible misconceptions. Third, we introduce a multi-step filtering pipeline to ensure the validity of the question and the diversity of the generated distractors. This work argues for the effectiveness of reasoning-enhanced finetuning in improving MCQ generation quality while maintaining accessibility and cost efficiency. We release all resources, including synthetically augmented questions, training code, and the best model as open-source.
KW - Large Language Models
KW - MCQ generation
KW - Multi-step filtering
KW - Synthetic augmentation
UR - https://www.scopus.com/pages/publications/105012036582
UR - https://www.scopus.com/pages/publications/105012036582#tab=citedBy
U2 - 10.1007/978-3-031-98465-5_39
DO - 10.1007/978-3-031-98465-5_39
M3 - Conference contribution
AN - SCOPUS:105012036582
SN - 9783031984648
T3 - Lecture Notes in Computer Science
SP - 308
EP - 315
BT - Artificial Intelligence in Education - 26th International Conference, AIED 2025, Proceedings
A2 - Cristea, Alexandra I.
A2 - Walker, Erin
A2 - Lu, Yu
A2 - Santos, Olga C.
A2 - Isotani, Seiji
PB - Springer Science and Business Media Deutschland GmbH
Y2 - 22 July 2025 through 26 July 2025
ER -