TY - GEN
T1 - BatchRank
T2 - 21st ACM SIGKDD Conference on Knowledge Discovery and Data Mining, KDD 2015
AU - Chakraborty, Shayok
AU - Balasubramanian, Vineeth
AU - Sankar, Adepu Ravi
AU - Panchanathan, Sethuraman
AU - Ye, Jieping
N1 - Publisher Copyright:
Copyright 2015 ACM.
PY - 2015/8/10
Y1 - 2015/8/10
N2 - Active learning algorithms automatically identify the salient and exemplar instances from large amounts of unlabeled data and thus reduce human annotation effort in inducing a classification model. More recently, Batch Mode Active Learning (BMAL) techniques have been proposed, where a batch of data samples is selected simultaneously from an unlabeled set. Most active learning algorithms assume a flat label space, that is, they consider the class labels to be independent. However, in many applications, the set of class labels are organized in a hierarchical tree structure, with the leaf nodes as outputs and the internal nodes as clusters of outputs at multiple levels of granularity. In this paper, we propose a novel BMAL algorithm (BatchRank) for hierarchical classification. The sample selection is posed as an NP-hard integer quadratic programming problem and a convex relaxation (based on linear programming) is derived, whose solution is further improved by an iterative truncated power method. Finally, a deterministic bound is established on the quality of the solution. Our empirical results on several challenging, real-world datasets from multiple domains, corroborate the potential of the proposed framework for real-world hierarchical classification applications.
AB - Active learning algorithms automatically identify the salient and exemplar instances from large amounts of unlabeled data and thus reduce human annotation effort in inducing a classification model. More recently, Batch Mode Active Learning (BMAL) techniques have been proposed, where a batch of data samples is selected simultaneously from an unlabeled set. Most active learning algorithms assume a flat label space, that is, they consider the class labels to be independent. However, in many applications, the set of class labels are organized in a hierarchical tree structure, with the leaf nodes as outputs and the internal nodes as clusters of outputs at multiple levels of granularity. In this paper, we propose a novel BMAL algorithm (BatchRank) for hierarchical classification. The sample selection is posed as an NP-hard integer quadratic programming problem and a convex relaxation (based on linear programming) is derived, whose solution is further improved by an iterative truncated power method. Finally, a deterministic bound is established on the quality of the solution. Our empirical results on several challenging, real-world datasets from multiple domains, corroborate the potential of the proposed framework for real-world hierarchical classification applications.
KW - Active learning
KW - Hierarchical classification
KW - Optimization
UR - http://www.scopus.com/inward/record.url?scp=84954096472&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84954096472&partnerID=8YFLogxK
U2 - 10.1145/2783258.2783298
DO - 10.1145/2783258.2783298
M3 - Conference contribution
AN - SCOPUS:84954096472
T3 - Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
SP - 99
EP - 108
BT - KDD 2015 - Proceedings of the 21st ACM SIGKDD Conference on Knowledge Discovery and Data Mining
PB - Association for Computing Machinery
Y2 - 10 August 2015 through 13 August 2015
ER -