A supervised clustering method for text classification

Umarani Pappuswamy; Dumisizwe Bhembe; Pamela W. Jordan; Kurt VanLehn

A supervised clustering method for text classification

Umarani Pappuswamy, Dumisizwe Bhembe, Pamela W. Jordan, Kurt VanLehn

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

Abstract

This paper describes a supervised three-tier clustering method for classifying students' essays of qualitative physics in the Why2-Atlas tutoring system. Our main purpose of categorizing text in our tutoring system is to map the students' essay statements into principles and misconceptions of physics. A simple 'bag-of-words' representation using a naïve-bayes algorithm to categorize text was unsatisfactory for our purposes of analyses as it exhibited many misclassifications because of the relatedness of the concepts themselves and its inability to handle misconceptions. Hence, we investigate the performance of the k-nearest neighborhood algorithm coupled with clusters of physics concepts on classifying students' essays. We use a three-tier tagging schemata (cluster, sub-cluster and class) for each document and found that this kind of supervised hierarchical clustering leads to a better understanding of the student's essay.

Original language	English (US)
Title of host publication	Lecture Notes in Computer Science
Editors	A. Gelbukh
Pages	704-714
Number of pages	11
Volume	3406
State	Published - 2005
Externally published	Yes
Event	6th International Conference, CICLing 2005 - Mexico City, Mexico Duration: Feb 13 2005 → Feb 19 2005

Other

Other	6th International Conference, CICLing 2005
Country/Territory	Mexico
City	Mexico City
Period	2/13/05 → 2/19/05

ASJC Scopus subject areas

Computer Science (miscellaneous)

Cite this

@inproceedings{26ac4b9199954db4aaf62c484ed18a3d,

title = "A supervised clustering method for text classification",

abstract = "This paper describes a supervised three-tier clustering method for classifying students' essays of qualitative physics in the Why2-Atlas tutoring system. Our main purpose of categorizing text in our tutoring system is to map the students' essay statements into principles and misconceptions of physics. A simple 'bag-of-words' representation using a na{\"i}ve-bayes algorithm to categorize text was unsatisfactory for our purposes of analyses as it exhibited many misclassifications because of the relatedness of the concepts themselves and its inability to handle misconceptions. Hence, we investigate the performance of the k-nearest neighborhood algorithm coupled with clusters of physics concepts on classifying students' essays. We use a three-tier tagging schemata (cluster, sub-cluster and class) for each document and found that this kind of supervised hierarchical clustering leads to a better understanding of the student's essay.",

author = "Umarani Pappuswamy and Dumisizwe Bhembe and Jordan, {Pamela W.} and Kurt VanLehn",

year = "2005",

language = "English (US)",

volume = "3406",

pages = "704--714",

editor = "A. Gelbukh",

booktitle = "Lecture Notes in Computer Science",

note = "6th International Conference, CICLing 2005 ; Conference date: 13-02-2005 Through 19-02-2005",

}

TY - GEN

T1 - A supervised clustering method for text classification

AU - Pappuswamy, Umarani

AU - Bhembe, Dumisizwe

AU - Jordan, Pamela W.

AU - VanLehn, Kurt

PY - 2005

Y1 - 2005

N2 - This paper describes a supervised three-tier clustering method for classifying students' essays of qualitative physics in the Why2-Atlas tutoring system. Our main purpose of categorizing text in our tutoring system is to map the students' essay statements into principles and misconceptions of physics. A simple 'bag-of-words' representation using a naïve-bayes algorithm to categorize text was unsatisfactory for our purposes of analyses as it exhibited many misclassifications because of the relatedness of the concepts themselves and its inability to handle misconceptions. Hence, we investigate the performance of the k-nearest neighborhood algorithm coupled with clusters of physics concepts on classifying students' essays. We use a three-tier tagging schemata (cluster, sub-cluster and class) for each document and found that this kind of supervised hierarchical clustering leads to a better understanding of the student's essay.

AB - This paper describes a supervised three-tier clustering method for classifying students' essays of qualitative physics in the Why2-Atlas tutoring system. Our main purpose of categorizing text in our tutoring system is to map the students' essay statements into principles and misconceptions of physics. A simple 'bag-of-words' representation using a naïve-bayes algorithm to categorize text was unsatisfactory for our purposes of analyses as it exhibited many misclassifications because of the relatedness of the concepts themselves and its inability to handle misconceptions. Hence, we investigate the performance of the k-nearest neighborhood algorithm coupled with clusters of physics concepts on classifying students' essays. We use a three-tier tagging schemata (cluster, sub-cluster and class) for each document and found that this kind of supervised hierarchical clustering leads to a better understanding of the student's essay.

UR - http://www.scopus.com/inward/record.url?scp=24344438279&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=24344438279&partnerID=8YFLogxK

M3 - Conference contribution

AN - SCOPUS:24344438279

VL - 3406

SP - 704

EP - 714

BT - Lecture Notes in Computer Science

A2 - Gelbukh, A.

T2 - 6th International Conference, CICLing 2005

Y2 - 13 February 2005 through 19 February 2005

ER -

A supervised clustering method for text classification

Abstract

Other

ASJC Scopus subject areas

Other files and links

Fingerprint

Cite this