Label Distribution Learning-Enhanced Dual-KNN for Text Classification

Bo Yuan, Yulin Chen, Zhen Tan, Wang Jinyan, Huan Liu, Yin Zhang

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Many text classification methods usually introduce external information (e.g., label descriptions and knowledge bases) to improve the classification performance. Compared to external information, some internal information generated by the model itself during training, like text embeddings and predicted label probability distributions, are exploited poorly when predicting the outcomes of some texts. In this paper, we focus on leveraging this internal information, proposing a dual k nearest neighbor (DkNN) framework with two kNN modules, to retrieve several neighbors from the training set and augment the distribution of labels. For the kNN module, it is easily confused and may cause incorrect predictions when retrieving some nearest neighbors from noisy datasets (datasets with labeling errors) or similar datasets (datasets with similar labels). To address this issue, we also introduce a label distribution learning module that can learn label similarity, and generate a better label distribution to help models distinguish texts more effectively. This module eases model overfitting and improves final classification performance, hence enhancing the quality of the retrieved neighbors by kNN modules during inference. Extensive experiments on the benchmark datasets verify the effectiveness of our method.

Original languageEnglish (US)
Title of host publicationProceedings of the 2024 SIAM International Conference on Data Mining, SDM 2024
EditorsShashi Shekhar, Vagelis Papalexakis, Jing Gao, Zhe Jiang, Matteo Riondato
PublisherSociety for Industrial and Applied Mathematics Publications
Pages400-408
Number of pages9
ISBN (Electronic)9781611978032
StatePublished - 2024
Event2024 SIAM International Conference on Data Mining, SDM 2024 - Houston, United States
Duration: Apr 18 2024Apr 20 2024

Publication series

NameProceedings of the 2024 SIAM International Conference on Data Mining, SDM 2024

Conference

Conference2024 SIAM International Conference on Data Mining, SDM 2024
Country/TerritoryUnited States
CityHouston
Period4/18/244/20/24

Keywords

  • k nearest neighbor
  • label distribution learning
  • robust learning
  • text classification

ASJC Scopus subject areas

  • Information Systems
  • Library and Information Sciences

Fingerprint

Dive into the research topics of 'Label Distribution Learning-Enhanced Dual-KNN for Text Classification'. Together they form a unique fingerprint.

Cite this