Predicting student dropout in self-paced mooc course using random forest model

Sheran Dass, Kevin Gary, James Cunningham

Research output: Contribution to journalArticlepeer-review

26 Scopus citations


A significant problem in Massive Open Online Courses (MOOCs) is the high rate of student dropout in these courses. An effective student dropout prediction model of MOOC courses can identify the factors responsible and provide insight on how to initiate interventions to increase student success in a MOOC. Different features and various approaches are available for the prediction of student dropout in MOOC courses. In this paper, the data derived from a self-paced math course, College Algebra and Problem Solving, offered on the MOOC platform Open edX partnering with Arizona State University (ASU) from 2016 to 2020 is considered. This paper presents a model to predict the dropout of students from a MOOC course given a set of features engineered from student daily learning progress. The Random Forest Model technique in Machine Learning (ML) is used in the prediction and is evaluated using validation metrics including accuracy, precision, recall, F1-score, Area Under the Curve (AUC), and Receiver Operating Characteristic (ROC) curve. The model developed can predict the dropout or continuation of students on any given day in the MOOC course with an accuracy of 87.5%, AUC of 94.5%, precision of 88%, recall of 87.5%, and F1-score of 87.5%, respectively. The contributing features and interactions were explained using Shapely values for the prediction of the model.

Original languageEnglish (US)
Article number476
JournalInformation (Switzerland)
Issue number11
StatePublished - Nov 2021


  • AUC
  • Dropout
  • MOOC
  • Prediction
  • ROC
  • Random forest
  • SHAP

ASJC Scopus subject areas

  • Information Systems


Dive into the research topics of 'Predicting student dropout in self-paced mooc course using random forest model'. Together they form a unique fingerprint.

Cite this