Understanding the role of training regimes in continual learning

Seyed Iman Mirzadeh; Mehrdad Farajtabar; Razvan Pascanu; Hassan Ghasemzadeh

Understanding the role of training regimes in continual learning

Seyed Iman Mirzadeh, Mehrdad Farajtabar, Razvan Pascanu, Hassan Ghasemzadeh

Research output: Contribution to journal › Conference article › peer-review

Abstract

Catastrophic forgetting affects the training of neural networks, limiting their ability to learn multiple tasks sequentially. From the perspective of the well established plasticity-stability dilemma, neural networks tend to be overly plastic, lacking the stability necessary to prevent the forgetting of previous knowledge, which means that as learning progresses, networks tend to forget previously seen tasks. This phenomenon coined in the continual learning literature, has attracted much attention lately, and several families of approaches have been proposed with different degrees of success. However, there has been limited prior work extensively analyzing the impact that different training regimes – learning rate, batch size, regularization method– can have on forgetting. In this work, we depart from the typical approach of altering the learning algorithm to improve stability. Instead, we hypothesize that the geometrical properties of the local minima found for each task play an important role in the overall degree of forgetting. In particular, we study the effect of dropout, learning rate decay, and batch size, on forming training regimes that widen the tasks’ local minima and consequently, on helping it not to forget catastrophically. Our study provides practical insights to improve stability via simple yet effective techniques that outperform alternative baselines.

Original language	English (US)
Journal	Advances in Neural Information Processing Systems
Volume	2020-December
State	Published - 2020
Externally published	Yes
Event	34th Conference on Neural Information Processing Systems, NeurIPS 2020 - Virtual, Online Duration: Dec 6 2020 → Dec 12 2020

ASJC Scopus subject areas

Computer Networks and Communications
Information Systems
Signal Processing

Cite this

@article{2e9f3e6960a049a5934f40b20fff4a51,

title = "Understanding the role of training regimes in continual learning",

abstract = "Catastrophic forgetting affects the training of neural networks, limiting their ability to learn multiple tasks sequentially. From the perspective of the well established plasticity-stability dilemma, neural networks tend to be overly plastic, lacking the stability necessary to prevent the forgetting of previous knowledge, which means that as learning progresses, networks tend to forget previously seen tasks. This phenomenon coined in the continual learning literature, has attracted much attention lately, and several families of approaches have been proposed with different degrees of success. However, there has been limited prior work extensively analyzing the impact that different training regimes – learning rate, batch size, regularization method– can have on forgetting. In this work, we depart from the typical approach of altering the learning algorithm to improve stability. Instead, we hypothesize that the geometrical properties of the local minima found for each task play an important role in the overall degree of forgetting. In particular, we study the effect of dropout, learning rate decay, and batch size, on forming training regimes that widen the tasks{\textquoteright} local minima and consequently, on helping it not to forget catastrophically. Our study provides practical insights to improve stability via simple yet effective techniques that outperform alternative baselines.",

author = "Mirzadeh, {Seyed Iman} and Mehrdad Farajtabar and Razvan Pascanu and Hassan Ghasemzadeh",

note = "Funding Information: SIM and HG acknowledge support from the United States National Science Foundation through grant CNS-1750679. The authors thank Anonymous Reviewers, Jonathan Schwarz, Sepehr Sameni, Hooman Shahrokhi, and Mohammad Sadegh Jazayeri for their valuable comments and feedback. Publisher Copyright: {\textcopyright} 2020 Neural information processing systems foundation. All rights reserved.; 34th Conference on Neural Information Processing Systems, NeurIPS 2020 ; Conference date: 06-12-2020 Through 12-12-2020",

year = "2020",

language = "English (US)",

volume = "2020-December",

journal = "Advances in Neural Information Processing Systems",

issn = "1049-5258",

}

TY - JOUR

T1 - Understanding the role of training regimes in continual learning

AU - Mirzadeh, Seyed Iman

AU - Farajtabar, Mehrdad

AU - Pascanu, Razvan

AU - Ghasemzadeh, Hassan

N1 - Funding Information: SIM and HG acknowledge support from the United States National Science Foundation through grant CNS-1750679. The authors thank Anonymous Reviewers, Jonathan Schwarz, Sepehr Sameni, Hooman Shahrokhi, and Mohammad Sadegh Jazayeri for their valuable comments and feedback. Publisher Copyright: © 2020 Neural information processing systems foundation. All rights reserved.

PY - 2020

Y1 - 2020

N2 - Catastrophic forgetting affects the training of neural networks, limiting their ability to learn multiple tasks sequentially. From the perspective of the well established plasticity-stability dilemma, neural networks tend to be overly plastic, lacking the stability necessary to prevent the forgetting of previous knowledge, which means that as learning progresses, networks tend to forget previously seen tasks. This phenomenon coined in the continual learning literature, has attracted much attention lately, and several families of approaches have been proposed with different degrees of success. However, there has been limited prior work extensively analyzing the impact that different training regimes – learning rate, batch size, regularization method– can have on forgetting. In this work, we depart from the typical approach of altering the learning algorithm to improve stability. Instead, we hypothesize that the geometrical properties of the local minima found for each task play an important role in the overall degree of forgetting. In particular, we study the effect of dropout, learning rate decay, and batch size, on forming training regimes that widen the tasks’ local minima and consequently, on helping it not to forget catastrophically. Our study provides practical insights to improve stability via simple yet effective techniques that outperform alternative baselines.

AB - Catastrophic forgetting affects the training of neural networks, limiting their ability to learn multiple tasks sequentially. From the perspective of the well established plasticity-stability dilemma, neural networks tend to be overly plastic, lacking the stability necessary to prevent the forgetting of previous knowledge, which means that as learning progresses, networks tend to forget previously seen tasks. This phenomenon coined in the continual learning literature, has attracted much attention lately, and several families of approaches have been proposed with different degrees of success. However, there has been limited prior work extensively analyzing the impact that different training regimes – learning rate, batch size, regularization method– can have on forgetting. In this work, we depart from the typical approach of altering the learning algorithm to improve stability. Instead, we hypothesize that the geometrical properties of the local minima found for each task play an important role in the overall degree of forgetting. In particular, we study the effect of dropout, learning rate decay, and batch size, on forming training regimes that widen the tasks’ local minima and consequently, on helping it not to forget catastrophically. Our study provides practical insights to improve stability via simple yet effective techniques that outperform alternative baselines.

UR - http://www.scopus.com/inward/record.url?scp=85102126436&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85102126436&partnerID=8YFLogxK

M3 - Conference article

AN - SCOPUS:85102126436

SN - 1049-5258

VL - 2020-December

JO - Advances in Neural Information Processing Systems

JF - Advances in Neural Information Processing Systems

T2 - 34th Conference on Neural Information Processing Systems, NeurIPS 2020

Y2 - 6 December 2020 through 12 December 2020

ER -

Understanding the role of training regimes in continual learning

Abstract

ASJC Scopus subject areas

Other files and links

Fingerprint

Cite this