Representation learning for imbalanced cross-domain classification

Lu Cheng; Ruocheng Guo; K. Selçuk Candan; Huan Liu

doi:10.1137/1.9781611976236.54

Representation learning for imbalanced cross-domain classification

Lu Cheng, Ruocheng Guo, K. Selçuk Candan, Huan Liu

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

9 Scopus citations

Abstract

Deep architectures are trained on massive amounts of labeled data to guarantee the performance of classification. In the absence of labeled data, domain adaptation often provides an attractive option given that labeled data of a similar nature but from a different domain is available. Previous work has chiefly focused on learning domain invariant representations but overlooked the issues of label imbalance in a single domain or across domains, which are common in many machine learning applications such as fake news detection. In this paper, we study a new cross-domain classification problem where data in each domain can be imbalanced (data imbalance), i.e., the classes are not evenly distributed, and the ratio of the number of positive over negative samples varies across domains (domain imbalance). This cross-domain problem is challenging as it entails covariate bias in the input feature space and representation bias in the latent space where domain invariant representations are learned. To address the challenge, in this paper, we propose an effective approach that leverages a doubly balancing strategy to simultaneously control these two types of bias and learn domain invariant representations. To this end, the proposed method aims to learn representations that are (i) robust to data and domain imbalance, (ii) discriminative between classes, and (iii) invariant across domains. Extensive evaluations of two important real-world applications corroborate the effectiveness of the proposed framework.

Original language	English (US)
Title of host publication	Proceedings of the 2020 SIAM International Conference on Data Mining, SDM 2020
Editors	Carlotta Demeniconi, Nitesh Chawla
Publisher	Society for Industrial and Applied Mathematics Publications
Pages	478-486
Number of pages	9
ISBN (Electronic)	9781611976236
DOIs	https://doi.org/10.1137/1.9781611976236.54
State	Published - 2020
Event	2020 SIAM International Conference on Data Mining, SDM 2020 - Cincinnati, United States Duration: May 7 2020 → May 9 2020

Publication series

Name	Proceedings of the 2020 SIAM International Conference on Data Mining, SDM 2020

Conference

Conference	2020 SIAM International Conference on Data Mining, SDM 2020
Country/Territory	United States
City	Cincinnati
Period	5/7/20 → 5/9/20

Keywords

Data Imbalance
Domain Imbalance
Representation Learning
Unsupervised Domain Adaptation

ASJC Scopus subject areas

Computer Science Applications
Software

Access to Document

10.1137/1.9781611976236.54

Cite this

Cheng, L., Guo, R., Candan, K. S., & Liu, H. (2020). Representation learning for imbalanced cross-domain classification. In C. Demeniconi, & N. Chawla (Eds.), Proceedings of the 2020 SIAM International Conference on Data Mining, SDM 2020 (pp. 478-486). (Proceedings of the 2020 SIAM International Conference on Data Mining, SDM 2020). Society for Industrial and Applied Mathematics Publications. https://doi.org/10.1137/1.9781611976236.54

Representation learning for imbalanced cross-domain classification. / Cheng, Lu; Guo, Ruocheng; Candan, K. Selçuk et al.
Proceedings of the 2020 SIAM International Conference on Data Mining, SDM 2020. ed. / Carlotta Demeniconi; Nitesh Chawla. Society for Industrial and Applied Mathematics Publications, 2020. p. 478-486 (Proceedings of the 2020 SIAM International Conference on Data Mining, SDM 2020).

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

Cheng, L, Guo, R, Candan, KS & Liu, H 2020, Representation learning for imbalanced cross-domain classification. in C Demeniconi & N Chawla (eds), Proceedings of the 2020 SIAM International Conference on Data Mining, SDM 2020. Proceedings of the 2020 SIAM International Conference on Data Mining, SDM 2020, Society for Industrial and Applied Mathematics Publications, pp. 478-486, 2020 SIAM International Conference on Data Mining, SDM 2020, Cincinnati, United States, 5/7/20. https://doi.org/10.1137/1.9781611976236.54

Cheng L, Guo R, Candan KS , Liu H. Representation learning for imbalanced cross-domain classification. In Demeniconi C, Chawla N, editors, Proceedings of the 2020 SIAM International Conference on Data Mining, SDM 2020. Society for Industrial and Applied Mathematics Publications. 2020. p. 478-486. (Proceedings of the 2020 SIAM International Conference on Data Mining, SDM 2020). doi: 10.1137/1.9781611976236.54

Cheng, Lu ; Guo, Ruocheng ; Candan, K. Selçuk et al. / Representation learning for imbalanced cross-domain classification. Proceedings of the 2020 SIAM International Conference on Data Mining, SDM 2020. editor / Carlotta Demeniconi ; Nitesh Chawla. Society for Industrial and Applied Mathematics Publications, 2020. pp. 478-486 (Proceedings of the 2020 SIAM International Conference on Data Mining, SDM 2020).

@inproceedings{5db617de609a42a1b1999558ae8a0758,

title = "Representation learning for imbalanced cross-domain classification",

abstract = "Deep architectures are trained on massive amounts of labeled data to guarantee the performance of classification. In the absence of labeled data, domain adaptation often provides an attractive option given that labeled data of a similar nature but from a different domain is available. Previous work has chiefly focused on learning domain invariant representations but overlooked the issues of label imbalance in a single domain or across domains, which are common in many machine learning applications such as fake news detection. In this paper, we study a new cross-domain classification problem where data in each domain can be imbalanced (data imbalance), i.e., the classes are not evenly distributed, and the ratio of the number of positive over negative samples varies across domains (domain imbalance). This cross-domain problem is challenging as it entails covariate bias in the input feature space and representation bias in the latent space where domain invariant representations are learned. To address the challenge, in this paper, we propose an effective approach that leverages a doubly balancing strategy to simultaneously control these two types of bias and learn domain invariant representations. To this end, the proposed method aims to learn representations that are (i) robust to data and domain imbalance, (ii) discriminative between classes, and (iii) invariant across domains. Extensive evaluations of two important real-world applications corroborate the effectiveness of the proposed framework.",

keywords = "Data Imbalance, Domain Imbalance, Representation Learning, Unsupervised Domain Adaptation",

author = "Lu Cheng and Ruocheng Guo and Candan, {K. Sel{\c c}uk} and Huan Liu",

note = "Funding Information: This material is based upon work supported by the National Science Foundation (NSF) Grants #1610282, #1633381, and #1909555. Publisher Copyright: Copyright {\textcopyright} 2020 by SIAM.; 2020 SIAM International Conference on Data Mining, SDM 2020 ; Conference date: 07-05-2020 Through 09-05-2020",

year = "2020",

doi = "10.1137/1.9781611976236.54",

language = "English (US)",

series = "Proceedings of the 2020 SIAM International Conference on Data Mining, SDM 2020",

publisher = "Society for Industrial and Applied Mathematics Publications",

pages = "478--486",

editor = "Carlotta Demeniconi and Nitesh Chawla",

booktitle = "Proceedings of the 2020 SIAM International Conference on Data Mining, SDM 2020",

}

TY - GEN

T1 - Representation learning for imbalanced cross-domain classification

AU - Cheng, Lu

AU - Guo, Ruocheng

AU - Candan, K. Selçuk

AU - Liu, Huan

PY - 2020

Y1 - 2020

N2 - Deep architectures are trained on massive amounts of labeled data to guarantee the performance of classification. In the absence of labeled data, domain adaptation often provides an attractive option given that labeled data of a similar nature but from a different domain is available. Previous work has chiefly focused on learning domain invariant representations but overlooked the issues of label imbalance in a single domain or across domains, which are common in many machine learning applications such as fake news detection. In this paper, we study a new cross-domain classification problem where data in each domain can be imbalanced (data imbalance), i.e., the classes are not evenly distributed, and the ratio of the number of positive over negative samples varies across domains (domain imbalance). This cross-domain problem is challenging as it entails covariate bias in the input feature space and representation bias in the latent space where domain invariant representations are learned. To address the challenge, in this paper, we propose an effective approach that leverages a doubly balancing strategy to simultaneously control these two types of bias and learn domain invariant representations. To this end, the proposed method aims to learn representations that are (i) robust to data and domain imbalance, (ii) discriminative between classes, and (iii) invariant across domains. Extensive evaluations of two important real-world applications corroborate the effectiveness of the proposed framework.

AB - Deep architectures are trained on massive amounts of labeled data to guarantee the performance of classification. In the absence of labeled data, domain adaptation often provides an attractive option given that labeled data of a similar nature but from a different domain is available. Previous work has chiefly focused on learning domain invariant representations but overlooked the issues of label imbalance in a single domain or across domains, which are common in many machine learning applications such as fake news detection. In this paper, we study a new cross-domain classification problem where data in each domain can be imbalanced (data imbalance), i.e., the classes are not evenly distributed, and the ratio of the number of positive over negative samples varies across domains (domain imbalance). This cross-domain problem is challenging as it entails covariate bias in the input feature space and representation bias in the latent space where domain invariant representations are learned. To address the challenge, in this paper, we propose an effective approach that leverages a doubly balancing strategy to simultaneously control these two types of bias and learn domain invariant representations. To this end, the proposed method aims to learn representations that are (i) robust to data and domain imbalance, (ii) discriminative between classes, and (iii) invariant across domains. Extensive evaluations of two important real-world applications corroborate the effectiveness of the proposed framework.

KW - Data Imbalance

KW - Domain Imbalance

KW - Representation Learning

KW - Unsupervised Domain Adaptation

UR - http://www.scopus.com/inward/record.url?scp=85089183457&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85089183457&partnerID=8YFLogxK

U2 - 10.1137/1.9781611976236.54

DO - 10.1137/1.9781611976236.54

M3 - Conference contribution

AN - SCOPUS:85089183457

T3 - Proceedings of the 2020 SIAM International Conference on Data Mining, SDM 2020

SP - 478

EP - 486

BT - Proceedings of the 2020 SIAM International Conference on Data Mining, SDM 2020

A2 - Demeniconi, Carlotta

A2 - Chawla, Nitesh

PB - Society for Industrial and Applied Mathematics Publications

T2 - 2020 SIAM International Conference on Data Mining, SDM 2020

Y2 - 7 May 2020 through 9 May 2020

ER -

Representation learning for imbalanced cross-domain classification

Abstract

Publication series

Conference

Keywords

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this