Learning from Counting: Leveraging Temporal Classification for Weakly Supervised Object Localization and Detection

Chia Yu Hsu, Wenwen Li

Research output: Contribution to conferencePaperpeer-review

1 Scopus citations

Abstract

This paper reports a new solution of leveraging temporal classification to support weakly supervised object detection (WSOD). Specifically, we introduce raster scan-order techniques to serialize 2D images into 1D sequence data, and then leverage a combined LSTM (Long, Short-Term Memory) and CTC (Connectionist Temporal Classification) network to achieve object localization based on a total count (of interested objects). We term our proposed network LSTM-CCTC (Count-based CTC). This “learning from counting” strategy differs from existing WSOD methods in that our approach automatically identifies critical points on or near a target object. This strategy significantly reduces the need of generating a large number of candidate proposals for object localization. Experiments show that our method yields state-of-the-art performance based on an evaluation on PASCAL VOC datasets.

Original languageEnglish (US)
StatePublished - 2020
Event31st British Machine Vision Conference, BMVC 2020 - Virtual, Online
Duration: Sep 7 2020Sep 10 2020

Conference

Conference31st British Machine Vision Conference, BMVC 2020
CityVirtual, Online
Period9/7/209/10/20

ASJC Scopus subject areas

  • Artificial Intelligence
  • Computer Vision and Pattern Recognition

Fingerprint

Dive into the research topics of 'Learning from Counting: Leveraging Temporal Classification for Weakly Supervised Object Localization and Detection'. Together they form a unique fingerprint.

Cite this