SMT: Sparse multivariate tree

Houtao Deng; Mustafa Gokce Baydogan; George Runger

doi:10.1002/sam.11208

SMT: Sparse multivariate tree

Houtao Deng, Mustafa Gokce Baydogan, George Runger

Research output: Contribution to journal › Article › peer-review

2 Scopus citations

Abstract

A multivariate decision tree attempts to improve upon the single variable split in a traditional tree. With the increase in datasets with many features and a small number of labeled instances in a variety of domains (bioinformatics, text mining, etc.), a traditional tree-based approach with a greedy variable selection at a node may omit important information. Therefore, the recursive partitioning idea of a simple decision tree combined with the intrinsic feature selection of L₁ regularized logistic regression (LR) at each node is a natural choice for a multivariate tree model that is simple, but broadly applicable. This natural solution leads to the sparse multivariate tree (SMT) considered here. SMT can naturally handle non-time-series data and is extended to handle time-series classification problems with the power of extracting interpretable temporal patterns (e.g., means, slopes, and deviations). Binary L₁ regularized LR models are used here for binary classification problems. However, SMT may be extended to solve multiclass problems with multinomial LR models. The accuracy and computational efficiency of SMT is compared to a large number of competitors on time series and non-time-series data.

Original language	English (US)
Pages (from-to)	53-69
Number of pages	17
Journal	Statistical Analysis and Data Mining
Volume	7
Issue number	1
DOIs	https://doi.org/10.1002/sam.11208
State	Published - Feb 2014

Keywords

Decision tree
Feature extraction
Fused Lasso
Lasso
Time series classification

ASJC Scopus subject areas

Analysis
Information Systems
Computer Science Applications

Access to Document

10.1002/sam.11208

Cite this

@article{116df9af4a164e79adb674cfd064c6d2,

title = "SMT: Sparse multivariate tree",

abstract = "A multivariate decision tree attempts to improve upon the single variable split in a traditional tree. With the increase in datasets with many features and a small number of labeled instances in a variety of domains (bioinformatics, text mining, etc.), a traditional tree-based approach with a greedy variable selection at a node may omit important information. Therefore, the recursive partitioning idea of a simple decision tree combined with the intrinsic feature selection of L1 regularized logistic regression (LR) at each node is a natural choice for a multivariate tree model that is simple, but broadly applicable. This natural solution leads to the sparse multivariate tree (SMT) considered here. SMT can naturally handle non-time-series data and is extended to handle time-series classification problems with the power of extracting interpretable temporal patterns (e.g., means, slopes, and deviations). Binary L1 regularized LR models are used here for binary classification problems. However, SMT may be extended to solve multiclass problems with multinomial LR models. The accuracy and computational efficiency of SMT is compared to a large number of competitors on time series and non-time-series data.",

keywords = "Decision tree, Feature extraction, Fused Lasso, Lasso, Time series classification",

author = "Houtao Deng and Baydogan, {Mustafa Gokce} and George Runger",

year = "2014",

month = feb,

doi = "10.1002/sam.11208",

language = "English (US)",

volume = "7",

pages = "53--69",

journal = "Statistical Analysis and Data Mining",

issn = "1932-1872",

publisher = "John Wiley and Sons Inc.",

number = "1",

}

TY - JOUR

T1 - SMT

T2 - Sparse multivariate tree

AU - Deng, Houtao

AU - Baydogan, Mustafa Gokce

AU - Runger, George

PY - 2014/2

Y1 - 2014/2

N2 - A multivariate decision tree attempts to improve upon the single variable split in a traditional tree. With the increase in datasets with many features and a small number of labeled instances in a variety of domains (bioinformatics, text mining, etc.), a traditional tree-based approach with a greedy variable selection at a node may omit important information. Therefore, the recursive partitioning idea of a simple decision tree combined with the intrinsic feature selection of L1 regularized logistic regression (LR) at each node is a natural choice for a multivariate tree model that is simple, but broadly applicable. This natural solution leads to the sparse multivariate tree (SMT) considered here. SMT can naturally handle non-time-series data and is extended to handle time-series classification problems with the power of extracting interpretable temporal patterns (e.g., means, slopes, and deviations). Binary L1 regularized LR models are used here for binary classification problems. However, SMT may be extended to solve multiclass problems with multinomial LR models. The accuracy and computational efficiency of SMT is compared to a large number of competitors on time series and non-time-series data.

AB - A multivariate decision tree attempts to improve upon the single variable split in a traditional tree. With the increase in datasets with many features and a small number of labeled instances in a variety of domains (bioinformatics, text mining, etc.), a traditional tree-based approach with a greedy variable selection at a node may omit important information. Therefore, the recursive partitioning idea of a simple decision tree combined with the intrinsic feature selection of L1 regularized logistic regression (LR) at each node is a natural choice for a multivariate tree model that is simple, but broadly applicable. This natural solution leads to the sparse multivariate tree (SMT) considered here. SMT can naturally handle non-time-series data and is extended to handle time-series classification problems with the power of extracting interpretable temporal patterns (e.g., means, slopes, and deviations). Binary L1 regularized LR models are used here for binary classification problems. However, SMT may be extended to solve multiclass problems with multinomial LR models. The accuracy and computational efficiency of SMT is compared to a large number of competitors on time series and non-time-series data.

KW - Decision tree

KW - Feature extraction

KW - Fused Lasso

KW - Lasso

KW - Time series classification

UR - http://www.scopus.com/inward/record.url?scp=84896838891&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84896838891&partnerID=8YFLogxK

U2 - 10.1002/sam.11208

DO - 10.1002/sam.11208

M3 - Article

AN - SCOPUS:84896838891

SN - 1932-1872

VL - 7

SP - 53

EP - 69

JO - Statistical Analysis and Data Mining

JF - Statistical Analysis and Data Mining

IS - 1

ER -

SMT: Sparse multivariate tree

Abstract

Keywords

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this