TY - GEN
T1 - Bias of importance measures for multi-valued attributes and solutions
AU - Deng, Houtao
AU - Runger, George
AU - Tuv, Eugene
N1 - Funding Information:
This research was partially supported by ONR grant N00014-09-1-0656.
PY - 2011
Y1 - 2011
N2 - Attribute importance measures for supervised learning are important for improving both learning accuracy and interpretability. However, it is well-known there could be bias when the predictor attributes have different numbers of values. We propose two methods to solve the bias problem. One uses an out-of-bag sampling method called OOBForest and one, based on the new concept of a partial permutation test, is called pForest. The existing research has considered the bias problem only among irrelevant attributes and equally informative attributes, while we compare to existing methods in a situation where unequally informative attributes (with or without interactions) and irrelevant attributes co-exist. We observe that the existing methods are not always reliable for multi-valued predictors, while the proposed methods compare favorably in our experiments.
AB - Attribute importance measures for supervised learning are important for improving both learning accuracy and interpretability. However, it is well-known there could be bias when the predictor attributes have different numbers of values. We propose two methods to solve the bias problem. One uses an out-of-bag sampling method called OOBForest and one, based on the new concept of a partial permutation test, is called pForest. The existing research has considered the bias problem only among irrelevant attributes and equally informative attributes, while we compare to existing methods in a situation where unequally informative attributes (with or without interactions) and irrelevant attributes co-exist. We observe that the existing methods are not always reliable for multi-valued predictors, while the proposed methods compare favorably in our experiments.
KW - Attribute importance
KW - cardinality
KW - feature selection
KW - random forest
UR - http://www.scopus.com/inward/record.url?scp=79959348887&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=79959348887&partnerID=8YFLogxK
U2 - 10.1007/978-3-642-21738-8_38
DO - 10.1007/978-3-642-21738-8_38
M3 - Conference contribution
AN - SCOPUS:79959348887
SN - 9783642217371
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 293
EP - 300
BT - Artificial Neural Networks and Machine Learning, ICANN 2011 - 21st International Conference on Artificial Neural Networks, Proceedings
T2 - 21st International Conference on Artificial Neural Networks, ICANN 2011
Y2 - 14 June 2011 through 17 June 2011
ER -