TY - GEN
T1 - Label Distribution Learning-Enhanced Dual-KNN for Text Classification
AU - Yuan, Bo
AU - Chen, Yulin
AU - Tan, Zhen
AU - Jinyan, Wang
AU - Liu, Huan
AU - Zhang, Yin
N1 - Publisher Copyright:
Copyright © 2024 by SIAM.
PY - 2024
Y1 - 2024
N2 - Many text classification methods usually introduce external information (e.g., label descriptions and knowledge bases) to improve the classification performance. Compared to external information, some internal information generated by the model itself during training, like text embeddings and predicted label probability distributions, are exploited poorly when predicting the outcomes of some texts. In this paper, we focus on leveraging this internal information, proposing a dual k nearest neighbor (DkNN) framework with two kNN modules, to retrieve several neighbors from the training set and augment the distribution of labels. For the kNN module, it is easily confused and may cause incorrect predictions when retrieving some nearest neighbors from noisy datasets (datasets with labeling errors) or similar datasets (datasets with similar labels). To address this issue, we also introduce a label distribution learning module that can learn label similarity, and generate a better label distribution to help models distinguish texts more effectively. This module eases model overfitting and improves final classification performance, hence enhancing the quality of the retrieved neighbors by kNN modules during inference. Extensive experiments on the benchmark datasets verify the effectiveness of our method.
AB - Many text classification methods usually introduce external information (e.g., label descriptions and knowledge bases) to improve the classification performance. Compared to external information, some internal information generated by the model itself during training, like text embeddings and predicted label probability distributions, are exploited poorly when predicting the outcomes of some texts. In this paper, we focus on leveraging this internal information, proposing a dual k nearest neighbor (DkNN) framework with two kNN modules, to retrieve several neighbors from the training set and augment the distribution of labels. For the kNN module, it is easily confused and may cause incorrect predictions when retrieving some nearest neighbors from noisy datasets (datasets with labeling errors) or similar datasets (datasets with similar labels). To address this issue, we also introduce a label distribution learning module that can learn label similarity, and generate a better label distribution to help models distinguish texts more effectively. This module eases model overfitting and improves final classification performance, hence enhancing the quality of the retrieved neighbors by kNN modules during inference. Extensive experiments on the benchmark datasets verify the effectiveness of our method.
KW - k nearest neighbor
KW - label distribution learning
KW - robust learning
KW - text classification
UR - http://www.scopus.com/inward/record.url?scp=85193487386&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85193487386&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:85193487386
T3 - Proceedings of the 2024 SIAM International Conference on Data Mining, SDM 2024
SP - 400
EP - 408
BT - Proceedings of the 2024 SIAM International Conference on Data Mining, SDM 2024
A2 - Shekhar, Shashi
A2 - Papalexakis, Vagelis
A2 - Gao, Jing
A2 - Jiang, Zhe
A2 - Riondato, Matteo
PB - Society for Industrial and Applied Mathematics Publications
T2 - 2024 SIAM International Conference on Data Mining, SDM 2024
Y2 - 18 April 2024 through 20 April 2024
ER -