Graph-based rare category detection

Jingrui He, Liu Yan, Richard Lawrence

Research output: Chapter in Book/Report/Conference proceedingConference contribution

42 Scopus citations


Rare category detection is the task of identifying examples from rare classes in an unlabeled data set. It is an open challenge in machine learning and plays key roles in real applications such as financial fraud detection, network intrusion detection, astronomy, spam image detection, etc. In this paper, we develop a new graph-based method for rare category detection named GRADE. It makes use of the global similarity matrix motivated by the manifold ranking algorithm, which results in more compact clusters for the minority classes; by selecting examples from the regions where probability density changes the most, it relaxes the assumption that the majority classes and the minority classes are separable. Furthermore, when detailed information about the data set is not available, we develop a modified version of GRADE named GRADE-LI, which only needs an upper bound on the proportion of each minority class as input. Besides working with data with structured features, both GRADE and GRADE-LI can also work with graph data, which can not be handled by existing rare category detection methods. Experimental results on both synthetic and real data sets demonstrate the effectiveness of the GRADE and GRADE-LI algorithms.

Original languageEnglish (US)
Title of host publicationProceedings - 8th IEEE International Conference on Data Mining, ICDM 2008
Number of pages6
StatePublished - 2008
Externally publishedYes
Event8th IEEE International Conference on Data Mining, ICDM 2008 - Pisa, Italy
Duration: Dec 15 2008Dec 19 2008

Publication series

NameProceedings - IEEE International Conference on Data Mining, ICDM
ISSN (Print)1550-4786


Other8th IEEE International Conference on Data Mining, ICDM 2008

ASJC Scopus subject areas

  • General Engineering


Dive into the research topics of 'Graph-based rare category detection'. Together they form a unique fingerprint.

Cite this