Learning from Counting: Leveraging Temporal Classification for Weakly Supervised Object Localization and Detection

Chia Yu Hsu; Wenwen Li

Learning from Counting: Leveraging Temporal Classification for Weakly Supervised Object Localization and Detection

Chia Yu Hsu, Wenwen Li

Geographical Sciences and Urban Planning, School of (SGSUP)

Research output: Contribution to conference › Paper › peer-review

1 Scopus citations

Abstract

This paper reports a new solution of leveraging temporal classification to support weakly supervised object detection (WSOD). Specifically, we introduce raster scan-order techniques to serialize 2D images into 1D sequence data, and then leverage a combined LSTM (Long, Short-Term Memory) and CTC (Connectionist Temporal Classification) network to achieve object localization based on a total count (of interested objects). We term our proposed network LSTM-CCTC (Count-based CTC). This “learning from counting” strategy differs from existing WSOD methods in that our approach automatically identifies critical points on or near a target object. This strategy significantly reduces the need of generating a large number of candidate proposals for object localization. Experiments show that our method yields state-of-the-art performance based on an evaluation on PASCAL VOC datasets.

Original language	English (US)
State	Published - 2020
Event	31st British Machine Vision Conference, BMVC 2020 - Virtual, Online Duration: Sep 7 2020 → Sep 10 2020

Conference

Conference	31st British Machine Vision Conference, BMVC 2020
City	Virtual, Online
Period	9/7/20 → 9/10/20

ASJC Scopus subject areas

Artificial Intelligence
Computer Vision and Pattern Recognition

Cite this

@conference{2129411699d74fb9b44965626c394930,

title = "Learning from Counting: Leveraging Temporal Classification for Weakly Supervised Object Localization and Detection",

abstract = "This paper reports a new solution of leveraging temporal classification to support weakly supervised object detection (WSOD). Specifically, we introduce raster scan-order techniques to serialize 2D images into 1D sequence data, and then leverage a combined LSTM (Long, Short-Term Memory) and CTC (Connectionist Temporal Classification) network to achieve object localization based on a total count (of interested objects). We term our proposed network LSTM-CCTC (Count-based CTC). This “learning from counting” strategy differs from existing WSOD methods in that our approach automatically identifies critical points on or near a target object. This strategy significantly reduces the need of generating a large number of candidate proposals for object localization. Experiments show that our method yields state-of-the-art performance based on an evaluation on PASCAL VOC datasets.",

author = "Hsu, {Chia Yu} and Wenwen Li",

note = "Publisher Copyright: {\textcopyright} 2020. The copyright of this document resides with its authors. It may be distributed unchanged freely in print or electronic forms.; 31st British Machine Vision Conference, BMVC 2020 ; Conference date: 07-09-2020 Through 10-09-2020",

year = "2020",

language = "English (US)",

}

TY - CONF

T1 - Learning from Counting

T2 - 31st British Machine Vision Conference, BMVC 2020

AU - Hsu, Chia Yu

AU - Li, Wenwen

PY - 2020

Y1 - 2020

N2 - This paper reports a new solution of leveraging temporal classification to support weakly supervised object detection (WSOD). Specifically, we introduce raster scan-order techniques to serialize 2D images into 1D sequence data, and then leverage a combined LSTM (Long, Short-Term Memory) and CTC (Connectionist Temporal Classification) network to achieve object localization based on a total count (of interested objects). We term our proposed network LSTM-CCTC (Count-based CTC). This “learning from counting” strategy differs from existing WSOD methods in that our approach automatically identifies critical points on or near a target object. This strategy significantly reduces the need of generating a large number of candidate proposals for object localization. Experiments show that our method yields state-of-the-art performance based on an evaluation on PASCAL VOC datasets.

AB - This paper reports a new solution of leveraging temporal classification to support weakly supervised object detection (WSOD). Specifically, we introduce raster scan-order techniques to serialize 2D images into 1D sequence data, and then leverage a combined LSTM (Long, Short-Term Memory) and CTC (Connectionist Temporal Classification) network to achieve object localization based on a total count (of interested objects). We term our proposed network LSTM-CCTC (Count-based CTC). This “learning from counting” strategy differs from existing WSOD methods in that our approach automatically identifies critical points on or near a target object. This strategy significantly reduces the need of generating a large number of candidate proposals for object localization. Experiments show that our method yields state-of-the-art performance based on an evaluation on PASCAL VOC datasets.

UR - http://www.scopus.com/inward/record.url?scp=85136319201&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85136319201&partnerID=8YFLogxK

M3 - Paper

AN - SCOPUS:85136319201

Y2 - 7 September 2020 through 10 September 2020

ER -

Learning from Counting: Leveraging Temporal Classification for Weakly Supervised Object Localization and Detection

Abstract

Conference

ASJC Scopus subject areas

Other files and links

Fingerprint

Cite this