Evaluating distributional semantic and feature selection for extracting relationships from biological text

Ehsan Emadzadeh; Siddhartha Jonnalagadda; Graciela Gonzalez

doi:10.1109/ICMLA.2011.65

Evaluating distributional semantic and feature selection for extracting relationships from biological text

Ehsan Emadzadeh, Siddhartha Jonnalagadda, Graciela Gonzalez

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

Abstract

The constant flow of biomolecular findings being published each day challenges our ability to develop methods to automatically extract the knowledge expressed in text to potentially influence new discoveries. Finding relations between the biological entities (e.g. proteins and genes) in text is a challenging task. To facilitate the extraction process, a relation can be decomposed into a trigger and the complementary arguments (e.g. theme, site). Several approaches have been proposed based on machine learning which generally use a common set of features for all trigger types. Here we evaluate the impact of applying a feature selection method for trigger classification. Our proposed method uses a greedy feature selection algorithm to find an optimal set of attributes for each trigger type. We show that using the customized set of features can improve classification results significantly (up to 53.96% in f-measure). In addition, we evaluated different settings for including semantic features in the classifiers. We found that using semantic features can improve classification results and found the best setting for each trigger type.

Original language	English (US)
Title of host publication	Proceedings - 10th International Conference on Machine Learning and Applications, ICMLA 2011
Pages	66-71
Number of pages	6
DOIs	https://doi.org/10.1109/ICMLA.2011.65
State	Published - 2011
Event	10th International Conference on Machine Learning and Applications, ICMLA 2011 - Honolulu, HI, United States Duration: Dec 18 2011 → Dec 21 2011

Publication series

Name	Proceedings - 10th International Conference on Machine Learning and Applications, ICMLA 2011
Volume	2

Other

Other	10th International Conference on Machine Learning and Applications, ICMLA 2011
Country/Territory	United States
City	Honolulu, HI
Period	12/18/11 → 12/21/11

Keywords

Distributional Semantic
Feature selection
NLP
Relation Extraction

ASJC Scopus subject areas

Computer Science Applications
Human-Computer Interaction

Access to Document

10.1109/ICMLA.2011.65

Cite this

Emadzadeh, E., Jonnalagadda, S., & Gonzalez, G. (2011). Evaluating distributional semantic and feature selection for extracting relationships from biological text. In Proceedings - 10th International Conference on Machine Learning and Applications, ICMLA 2011 (pp. 66-71). Article 6147050 (Proceedings - 10th International Conference on Machine Learning and Applications, ICMLA 2011; Vol. 2). https://doi.org/10.1109/ICMLA.2011.65

Evaluating distributional semantic and feature selection for extracting relationships from biological text. / Emadzadeh, Ehsan; Jonnalagadda, Siddhartha; Gonzalez, Graciela.
Proceedings - 10th International Conference on Machine Learning and Applications, ICMLA 2011. 2011. p. 66-71 6147050 (Proceedings - 10th International Conference on Machine Learning and Applications, ICMLA 2011; Vol. 2).

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

Emadzadeh, E, Jonnalagadda, S & Gonzalez, G 2011, Evaluating distributional semantic and feature selection for extracting relationships from biological text. in Proceedings - 10th International Conference on Machine Learning and Applications, ICMLA 2011., 6147050, Proceedings - 10th International Conference on Machine Learning and Applications, ICMLA 2011, vol. 2, pp. 66-71, 10th International Conference on Machine Learning and Applications, ICMLA 2011, Honolulu, HI, United States, 12/18/11. https://doi.org/10.1109/ICMLA.2011.65

Emadzadeh E, Jonnalagadda S, Gonzalez G. Evaluating distributional semantic and feature selection for extracting relationships from biological text. In Proceedings - 10th International Conference on Machine Learning and Applications, ICMLA 2011. 2011. p. 66-71. 6147050. (Proceedings - 10th International Conference on Machine Learning and Applications, ICMLA 2011). doi: 10.1109/ICMLA.2011.65

Emadzadeh, Ehsan ; Jonnalagadda, Siddhartha ; Gonzalez, Graciela. / Evaluating distributional semantic and feature selection for extracting relationships from biological text. Proceedings - 10th International Conference on Machine Learning and Applications, ICMLA 2011. 2011. pp. 66-71 (Proceedings - 10th International Conference on Machine Learning and Applications, ICMLA 2011).

@inproceedings{7c8068f508d343a08ca72d10635e75e4,

title = "Evaluating distributional semantic and feature selection for extracting relationships from biological text",

abstract = "The constant flow of biomolecular findings being published each day challenges our ability to develop methods to automatically extract the knowledge expressed in text to potentially influence new discoveries. Finding relations between the biological entities (e.g. proteins and genes) in text is a challenging task. To facilitate the extraction process, a relation can be decomposed into a trigger and the complementary arguments (e.g. theme, site). Several approaches have been proposed based on machine learning which generally use a common set of features for all trigger types. Here we evaluate the impact of applying a feature selection method for trigger classification. Our proposed method uses a greedy feature selection algorithm to find an optimal set of attributes for each trigger type. We show that using the customized set of features can improve classification results significantly (up to 53.96% in f-measure). In addition, we evaluated different settings for including semantic features in the classifiers. We found that using semantic features can improve classification results and found the best setting for each trigger type.",

keywords = "Distributional Semantic, Feature selection, NLP, Relation Extraction",

author = "Ehsan Emadzadeh and Siddhartha Jonnalagadda and Graciela Gonzalez",

year = "2011",

doi = "10.1109/ICMLA.2011.65",

language = "English (US)",

isbn = "9780769546070",

series = "Proceedings - 10th International Conference on Machine Learning and Applications, ICMLA 2011",

pages = "66--71",

booktitle = "Proceedings - 10th International Conference on Machine Learning and Applications, ICMLA 2011",

note = "10th International Conference on Machine Learning and Applications, ICMLA 2011 ; Conference date: 18-12-2011 Through 21-12-2011",

}

TY - GEN

T1 - Evaluating distributional semantic and feature selection for extracting relationships from biological text

AU - Emadzadeh, Ehsan

AU - Jonnalagadda, Siddhartha

AU - Gonzalez, Graciela

PY - 2011

Y1 - 2011

N2 - The constant flow of biomolecular findings being published each day challenges our ability to develop methods to automatically extract the knowledge expressed in text to potentially influence new discoveries. Finding relations between the biological entities (e.g. proteins and genes) in text is a challenging task. To facilitate the extraction process, a relation can be decomposed into a trigger and the complementary arguments (e.g. theme, site). Several approaches have been proposed based on machine learning which generally use a common set of features for all trigger types. Here we evaluate the impact of applying a feature selection method for trigger classification. Our proposed method uses a greedy feature selection algorithm to find an optimal set of attributes for each trigger type. We show that using the customized set of features can improve classification results significantly (up to 53.96% in f-measure). In addition, we evaluated different settings for including semantic features in the classifiers. We found that using semantic features can improve classification results and found the best setting for each trigger type.

AB - The constant flow of biomolecular findings being published each day challenges our ability to develop methods to automatically extract the knowledge expressed in text to potentially influence new discoveries. Finding relations between the biological entities (e.g. proteins and genes) in text is a challenging task. To facilitate the extraction process, a relation can be decomposed into a trigger and the complementary arguments (e.g. theme, site). Several approaches have been proposed based on machine learning which generally use a common set of features for all trigger types. Here we evaluate the impact of applying a feature selection method for trigger classification. Our proposed method uses a greedy feature selection algorithm to find an optimal set of attributes for each trigger type. We show that using the customized set of features can improve classification results significantly (up to 53.96% in f-measure). In addition, we evaluated different settings for including semantic features in the classifiers. We found that using semantic features can improve classification results and found the best setting for each trigger type.

KW - Distributional Semantic

KW - Feature selection

KW - NLP

KW - Relation Extraction

UR - http://www.scopus.com/inward/record.url?scp=84857874139&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84857874139&partnerID=8YFLogxK

U2 - 10.1109/ICMLA.2011.65

DO - 10.1109/ICMLA.2011.65

M3 - Conference contribution

AN - SCOPUS:84857874139

SN - 9780769546070

T3 - Proceedings - 10th International Conference on Machine Learning and Applications, ICMLA 2011

SP - 66

EP - 71

BT - Proceedings - 10th International Conference on Machine Learning and Applications, ICMLA 2011

T2 - 10th International Conference on Machine Learning and Applications, ICMLA 2011

Y2 - 18 December 2011 through 21 December 2011

ER -

Evaluating distributional semantic and feature selection for extracting relationships from biological text

Abstract

Publication series

Other

Keywords

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this