Active data labeling for improved classifier generalizability

Research output: Contribution to journalArticlepeer-review

1 Scopus citations


Existing statistical learning methods perform well when evaluated on training and test data drawn from the same distribution. In practice, however, these distributions are not always the same. In this paper we derive an estimable upper bound on the test error rate that depends on a new probability distance measure between training and test distributions. Furthermore, we identify a non-parametric estimator for this distance measure that can be estimated directly from data. We show how this new probability distance measure can be used to construct algorithmic tools that improve performance. In particular, motivated by our upper bound, we propose a new active learning algorithm for domain adaptation. Comparative results confirm the efficacy of the active learning algorithm on a set of 12 speech classification tasks.

Original languageEnglish (US)
Pages (from-to)272-277
Number of pages6
JournalSignal Processing
StatePublished - Mar 2015


  • Active learning
  • Classification
  • Divergence measures
  • Domain adaptation

ASJC Scopus subject areas

  • Control and Systems Engineering
  • Software
  • Signal Processing
  • Computer Vision and Pattern Recognition
  • Electrical and Electronic Engineering


Dive into the research topics of 'Active data labeling for improved classifier generalizability'. Together they form a unique fingerprint.

Cite this