CSaRUS-CNN at AMIA-2017 Tasks 1, 2: Under sampled CNN for text classification

Arjun Magge; Matthew Scotch; Graciela Gonzalez

CSaRUS-CNN at AMIA-2017 Tasks 1, 2: Under sampled CNN for text classification

Arjun Magge, Matthew Scotch, Graciela Gonzalez

Research output: Contribution to journal › Conference article › peer-review

Abstract

Most practical text classification tasks in natural language processing involve training sets where the number of training instances belonging to each of the classes are not equal. The performance of the classifier in such a case can be affected by the sampling strategies used in training. In this work, we describe a cost sensitive and random undersampling variants of convolutional neural networks (CNNs) for classifying texts in imbalanced datasets and analyze its results. The classifier proposed in this paper achieves a maximum F1-score of 0.414 placing 2nd on the ADR dataset and achieves a maximum F1-score of 0.652 placing 6th on the medication intake dataset.

Original language	English (US)
Pages (from-to)	76-78
Number of pages	3
Journal	CEUR Workshop Proceedings
Volume	1996
State	Published - 2017
Event	2nd Social Media Mining for Health Research and Applications Workshop, SMM4H 2017 - Washington, United States Duration: Nov 4 2017 → …

ASJC Scopus subject areas

General Computer Science

Cite this

@article{ccecaac044f440f1a09ea389e85b2d27,

title = "CSaRUS-CNN at AMIA-2017 Tasks 1, 2: Under sampled CNN for text classification",

abstract = "Most practical text classification tasks in natural language processing involve training sets where the number of training instances belonging to each of the classes are not equal. The performance of the classifier in such a case can be affected by the sampling strategies used in training. In this work, we describe a cost sensitive and random undersampling variants of convolutional neural networks (CNNs) for classifying texts in imbalanced datasets and analyze its results. The classifier proposed in this paper achieves a maximum F1-score of 0.414 placing 2nd on the ADR dataset and achieves a maximum F1-score of 0.652 placing 6th on the medication intake dataset.",

author = "Arjun Magge and Matthew Scotch and Graciela Gonzalez",

year = "2017",

language = "English (US)",

volume = "1996",

pages = "76--78",

journal = "CEUR Workshop Proceedings",

issn = "1613-0073",

publisher = "CEUR-WS",

note = "2nd Social Media Mining for Health Research and Applications Workshop, SMM4H 2017 ; Conference date: 04-11-2017",

}

TY - JOUR

T1 - CSaRUS-CNN at AMIA-2017 Tasks 1, 2

T2 - 2nd Social Media Mining for Health Research and Applications Workshop, SMM4H 2017

AU - Magge, Arjun

AU - Scotch, Matthew

AU - Gonzalez, Graciela

PY - 2017

Y1 - 2017

N2 - Most practical text classification tasks in natural language processing involve training sets where the number of training instances belonging to each of the classes are not equal. The performance of the classifier in such a case can be affected by the sampling strategies used in training. In this work, we describe a cost sensitive and random undersampling variants of convolutional neural networks (CNNs) for classifying texts in imbalanced datasets and analyze its results. The classifier proposed in this paper achieves a maximum F1-score of 0.414 placing 2nd on the ADR dataset and achieves a maximum F1-score of 0.652 placing 6th on the medication intake dataset.

AB - Most practical text classification tasks in natural language processing involve training sets where the number of training instances belonging to each of the classes are not equal. The performance of the classifier in such a case can be affected by the sampling strategies used in training. In this work, we describe a cost sensitive and random undersampling variants of convolutional neural networks (CNNs) for classifying texts in imbalanced datasets and analyze its results. The classifier proposed in this paper achieves a maximum F1-score of 0.414 placing 2nd on the ADR dataset and achieves a maximum F1-score of 0.652 placing 6th on the medication intake dataset.

UR - http://www.scopus.com/inward/record.url?scp=85037044221&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85037044221&partnerID=8YFLogxK

M3 - Conference article

AN - SCOPUS:85037044221

SN - 1613-0073

VL - 1996

SP - 76

EP - 78

JO - CEUR Workshop Proceedings

JF - CEUR Workshop Proceedings

Y2 - 4 November 2017

ER -

CSaRUS-CNN at AMIA-2017 Tasks 1, 2: Under sampled CNN for text classification

Abstract

ASJC Scopus subject areas

Other files and links

Fingerprint

Cite this