SEED: SELF-SUPERVISED DISTILLATION FOR VISUAL REPRESENTATION

Zhiyuan Fang; Jianfeng Wang; Lijuan Wang; Lei Zhang; Yezhou Yang; Zicheng Liu

SEED: SELF-SUPERVISED DISTILLATION FOR VISUAL REPRESENTATION

Zhiyuan Fang, Jianfeng Wang, Lijuan Wang, Lei Zhang, Yezhou Yang, Zicheng Liu

Engineering, Ira A. Fulton Schools of (IAFSE)

Research output: Contribution to conference › Paper › peer-review

81 Scopus citations

Abstract

This paper is concerned with self-supervised learning for small models. The problem is motivated by our empirical studies that while the widely used contrastive self-supervised learning method has shown great progress on large model training, it does not work well for small models. To address this problem, we propose a new learning paradigm, named SElf-SupErvised Distillation (SEED), where we leverage a larger network (as Teacher) to transfer its representational knowledge into a smaller architecture (as Student) in a self-supervised fashion. Instead of directly learning from unlabeled data, we train a student encoder to mimic the similarity score distribution inferred by a teacher over a set of instances. We show that SEED dramatically boosts the performance of small networks on downstream tasks. Compared with self-supervised baselines, SEED improves the top-1 accuracy from 42.2% to 67.6% on EfficientNet-B0 and from 36.3% to 68.2% on MobileNetV3-Large on the ImageNet-1k dataset.

Original language	English (US)
State	Published - 2021
Event	9th International Conference on Learning Representations, ICLR 2021 - Virtual, Online Duration: May 3 2021 → May 7 2021

Conference

Conference	9th International Conference on Learning Representations, ICLR 2021
City	Virtual, Online
Period	5/3/21 → 5/7/21

ASJC Scopus subject areas

Language and Linguistics
Computer Science Applications
Education
Linguistics and Language

Cite this

@conference{e463347d51604916a1a63b705f235f19,

title = "SEED: SELF-SUPERVISED DISTILLATION FOR VISUAL REPRESENTATION",

abstract = "This paper is concerned with self-supervised learning for small models. The problem is motivated by our empirical studies that while the widely used contrastive self-supervised learning method has shown great progress on large model training, it does not work well for small models. To address this problem, we propose a new learning paradigm, named SElf-SupErvised Distillation (SEED), where we leverage a larger network (as Teacher) to transfer its representational knowledge into a smaller architecture (as Student) in a self-supervised fashion. Instead of directly learning from unlabeled data, we train a student encoder to mimic the similarity score distribution inferred by a teacher over a set of instances. We show that SEED dramatically boosts the performance of small networks on downstream tasks. Compared with self-supervised baselines, SEED improves the top-1 accuracy from 42.2% to 67.6% on EfficientNet-B0 and from 36.3% to 68.2% on MobileNetV3-Large on the ImageNet-1k dataset.",

author = "Zhiyuan Fang and Jianfeng Wang and Lijuan Wang and Lei Zhang and Yezhou Yang and Zicheng Liu",

note = "Publisher Copyright: {\textcopyright} 2021 ICLR 2021 - 9th International Conference on Learning Representations. All rights reserved.; 9th International Conference on Learning Representations, ICLR 2021 ; Conference date: 03-05-2021 Through 07-05-2021",

year = "2021",

language = "English (US)",

}

TY - CONF

T1 - SEED

T2 - 9th International Conference on Learning Representations, ICLR 2021

AU - Fang, Zhiyuan

AU - Wang, Jianfeng

AU - Wang, Lijuan

AU - Zhang, Lei

AU - Yang, Yezhou

AU - Liu, Zicheng

PY - 2021

Y1 - 2021

N2 - This paper is concerned with self-supervised learning for small models. The problem is motivated by our empirical studies that while the widely used contrastive self-supervised learning method has shown great progress on large model training, it does not work well for small models. To address this problem, we propose a new learning paradigm, named SElf-SupErvised Distillation (SEED), where we leverage a larger network (as Teacher) to transfer its representational knowledge into a smaller architecture (as Student) in a self-supervised fashion. Instead of directly learning from unlabeled data, we train a student encoder to mimic the similarity score distribution inferred by a teacher over a set of instances. We show that SEED dramatically boosts the performance of small networks on downstream tasks. Compared with self-supervised baselines, SEED improves the top-1 accuracy from 42.2% to 67.6% on EfficientNet-B0 and from 36.3% to 68.2% on MobileNetV3-Large on the ImageNet-1k dataset.

AB - This paper is concerned with self-supervised learning for small models. The problem is motivated by our empirical studies that while the widely used contrastive self-supervised learning method has shown great progress on large model training, it does not work well for small models. To address this problem, we propose a new learning paradigm, named SElf-SupErvised Distillation (SEED), where we leverage a larger network (as Teacher) to transfer its representational knowledge into a smaller architecture (as Student) in a self-supervised fashion. Instead of directly learning from unlabeled data, we train a student encoder to mimic the similarity score distribution inferred by a teacher over a set of instances. We show that SEED dramatically boosts the performance of small networks on downstream tasks. Compared with self-supervised baselines, SEED improves the top-1 accuracy from 42.2% to 67.6% on EfficientNet-B0 and from 36.3% to 68.2% on MobileNetV3-Large on the ImageNet-1k dataset.

UR - http://www.scopus.com/inward/record.url?scp=85146209924&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85146209924&partnerID=8YFLogxK

M3 - Paper

AN - SCOPUS:85146209924

Y2 - 3 May 2021 through 7 May 2021

ER -

SEED: SELF-SUPERVISED DISTILLATION FOR VISUAL REPRESENTATION

Abstract

Conference

ASJC Scopus subject areas

Other files and links

Fingerprint

Cite this