A supervised clustering method for text classification

Umarani Pappuswamy, Dumisizwe Bhembe, Pamela W. Jordan, Kurt VanLehn

Research output: Chapter in Book/Report/Conference proceedingConference contribution

4 Scopus citations


This paper describes a supervised three-tier clustering method for classifying students' essays of qualitative physics in the Why2-Atlas tutoring system. Our main purpose of categorizing text in our tutoring system is to map the students' essay statements into principles and misconceptions of physics. A simple 'bag-of-words' representation using a naïve-bayes algorithm to categorize text was unsatisfactory for our purposes of analyses as it exhibited many misclassifications because of the relatedness of the concepts themselves and its inability to handle misconceptions. Hence, we investigate the performance of the k-nearest neighborhood algorithm coupled with clusters of physics concepts on classifying students' essays. We use a three-tier tagging schemata (cluster, sub-cluster and class) for each document and found that this kind of supervised hierarchical clustering leads to a better understanding of the student's essay.

Original languageEnglish (US)
Title of host publicationLecture Notes in Computer Science
EditorsA. Gelbukh
Number of pages11
StatePublished - 2005
Externally publishedYes
Event6th International Conference, CICLing 2005 - Mexico City, Mexico
Duration: Feb 13 2005Feb 19 2005


Other6th International Conference, CICLing 2005
CityMexico City

ASJC Scopus subject areas

  • Computer Science (miscellaneous)


Dive into the research topics of 'A supervised clustering method for text classification'. Together they form a unique fingerprint.

Cite this