Detecting malicious domains with behavioral modeling and graph embedding

Kai Lei; Qiuai Fu; Jiake Ni; Feiyang Wang; Min Yang; Kuai Xu

doi:10.1109/ICDCS.2019.00066

Detecting malicious domains with behavioral modeling and graph embedding

Kai Lei, Qiuai Fu, Jiake Ni, Feiyang Wang, Min Yang, Kuai Xu

Mathematical and Natural Sciences, School of (SMNS)

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

14 Scopus citations

Abstract

The last decade has witnessed the explosive growth of malicious Internet domains which serve as the fundamental infrastructure for establishing advanced persistent threat command and control communication channels or hosting phishing Web sites. Given the big data nature of Internet traffic data and the ability of algorithmically generating domains and acquiring and registering the domains in a near-automated fashion, detecting malicious domains in real-time is a daunting task for security analysts and network operators. In this paper, we introduce bipartite graphs to capture the interactions between end hosts and domains, identify associated IP addresses of domains, and characterize time-series patterns of DNS queries for domains, and explore one-mode projections of these bipartite graphs for modeling the behavioral, IP-structural, and temporal similarities between domains. We employ graph embedding technique to automatically learn dynamic and discriminative feature representations for over 10,000 labeled domains, and develop an SVM-based classification algorithm for predicting malicious or benign domains. Our model makes the progress towards adapting to the changing and evolving strategies of malicious domains. The experimental results have shown that our proposed algorithm achieves an area under the curve (AUC) of 0.94 based on k-fold cross-validation. To the best of our knowledge, this is the first effort to apply the combination of behavioral modeling and graph embedding for effectively and accurately detecting malicious domains.

Original language	English (US)
Title of host publication	Proceedings - 2019 39th IEEE International Conference on Distributed Computing Systems, ICDCS 2019
Publisher	Institute of Electrical and Electronics Engineers Inc.
Pages	601-611
Number of pages	11
ISBN (Electronic)	9781728125190
DOIs	https://doi.org/10.1109/ICDCS.2019.00066
State	Published - Jul 2019
Event	39th IEEE International Conference on Distributed Computing Systems, ICDCS 2019 - Richardson, United States Duration: Jul 7 2019 → Jul 9 2019

Publication series

Name	Proceedings - International Conference on Distributed Computing Systems
Volume	2019-July

Conference

Conference	39th IEEE International Conference on Distributed Computing Systems, ICDCS 2019
Country/Territory	United States
City	Richardson
Period	7/7/19 → 7/9/19

Keywords

Behavioral Modeling
Graph Embedding
Malicious Domain Detection

ASJC Scopus subject areas

Software
Hardware and Architecture
Computer Networks and Communications

Access to Document

10.1109/ICDCS.2019.00066

Cite this

Lei, K., Fu, Q., Ni, J., Wang, F., Yang, M., & Xu, K. (2019). Detecting malicious domains with behavioral modeling and graph embedding. In Proceedings - 2019 39th IEEE International Conference on Distributed Computing Systems, ICDCS 2019 (pp. 601-611). Article 8885225 (Proceedings - International Conference on Distributed Computing Systems; Vol. 2019-July). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/ICDCS.2019.00066

Detecting malicious domains with behavioral modeling and graph embedding. / Lei, Kai; Fu, Qiuai; Ni, Jiake et al.
Proceedings - 2019 39th IEEE International Conference on Distributed Computing Systems, ICDCS 2019. Institute of Electrical and Electronics Engineers Inc., 2019. p. 601-611 8885225 (Proceedings - International Conference on Distributed Computing Systems; Vol. 2019-July).

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

Lei, K, Fu, Q, Ni, J, Wang, F, Yang, M & Xu, K 2019, Detecting malicious domains with behavioral modeling and graph embedding. in Proceedings - 2019 39th IEEE International Conference on Distributed Computing Systems, ICDCS 2019., 8885225, Proceedings - International Conference on Distributed Computing Systems, vol. 2019-July, Institute of Electrical and Electronics Engineers Inc., pp. 601-611, 39th IEEE International Conference on Distributed Computing Systems, ICDCS 2019, Richardson, United States, 7/7/19. https://doi.org/10.1109/ICDCS.2019.00066

Lei K, Fu Q, Ni J, Wang F, Yang M, Xu K. Detecting malicious domains with behavioral modeling and graph embedding. In Proceedings - 2019 39th IEEE International Conference on Distributed Computing Systems, ICDCS 2019. Institute of Electrical and Electronics Engineers Inc. 2019. p. 601-611. 8885225. (Proceedings - International Conference on Distributed Computing Systems). doi: 10.1109/ICDCS.2019.00066

@inproceedings{727ea0b21d9b4c96beda765cde2da80b,

title = "Detecting malicious domains with behavioral modeling and graph embedding",

abstract = "The last decade has witnessed the explosive growth of malicious Internet domains which serve as the fundamental infrastructure for establishing advanced persistent threat command and control communication channels or hosting phishing Web sites. Given the big data nature of Internet traffic data and the ability of algorithmically generating domains and acquiring and registering the domains in a near-automated fashion, detecting malicious domains in real-time is a daunting task for security analysts and network operators. In this paper, we introduce bipartite graphs to capture the interactions between end hosts and domains, identify associated IP addresses of domains, and characterize time-series patterns of DNS queries for domains, and explore one-mode projections of these bipartite graphs for modeling the behavioral, IP-structural, and temporal similarities between domains. We employ graph embedding technique to automatically learn dynamic and discriminative feature representations for over 10,000 labeled domains, and develop an SVM-based classification algorithm for predicting malicious or benign domains. Our model makes the progress towards adapting to the changing and evolving strategies of malicious domains. The experimental results have shown that our proposed algorithm achieves an area under the curve (AUC) of 0.94 based on k-fold cross-validation. To the best of our knowledge, this is the first effort to apply the combination of behavioral modeling and graph embedding for effectively and accurately detecting malicious domains.",

keywords = "Behavioral Modeling, Graph Embedding, Malicious Domain Detection",

author = "Kai Lei and Qiuai Fu and Jiake Ni and Feiyang Wang and Min Yang and Kuai Xu",

note = "Publisher Copyright: {\textcopyright} 2019 IEEE.; 39th IEEE International Conference on Distributed Computing Systems, ICDCS 2019 ; Conference date: 07-07-2019 Through 09-07-2019",

year = "2019",

month = jul,

doi = "10.1109/ICDCS.2019.00066",

language = "English (US)",

series = "Proceedings - International Conference on Distributed Computing Systems",

publisher = "Institute of Electrical and Electronics Engineers Inc.",

pages = "601--611",

booktitle = "Proceedings - 2019 39th IEEE International Conference on Distributed Computing Systems, ICDCS 2019",

}

TY - GEN

T1 - Detecting malicious domains with behavioral modeling and graph embedding

AU - Lei, Kai

AU - Fu, Qiuai

AU - Ni, Jiake

AU - Wang, Feiyang

AU - Yang, Min

AU - Xu, Kuai

PY - 2019/7

Y1 - 2019/7

N2 - The last decade has witnessed the explosive growth of malicious Internet domains which serve as the fundamental infrastructure for establishing advanced persistent threat command and control communication channels or hosting phishing Web sites. Given the big data nature of Internet traffic data and the ability of algorithmically generating domains and acquiring and registering the domains in a near-automated fashion, detecting malicious domains in real-time is a daunting task for security analysts and network operators. In this paper, we introduce bipartite graphs to capture the interactions between end hosts and domains, identify associated IP addresses of domains, and characterize time-series patterns of DNS queries for domains, and explore one-mode projections of these bipartite graphs for modeling the behavioral, IP-structural, and temporal similarities between domains. We employ graph embedding technique to automatically learn dynamic and discriminative feature representations for over 10,000 labeled domains, and develop an SVM-based classification algorithm for predicting malicious or benign domains. Our model makes the progress towards adapting to the changing and evolving strategies of malicious domains. The experimental results have shown that our proposed algorithm achieves an area under the curve (AUC) of 0.94 based on k-fold cross-validation. To the best of our knowledge, this is the first effort to apply the combination of behavioral modeling and graph embedding for effectively and accurately detecting malicious domains.

AB - The last decade has witnessed the explosive growth of malicious Internet domains which serve as the fundamental infrastructure for establishing advanced persistent threat command and control communication channels or hosting phishing Web sites. Given the big data nature of Internet traffic data and the ability of algorithmically generating domains and acquiring and registering the domains in a near-automated fashion, detecting malicious domains in real-time is a daunting task for security analysts and network operators. In this paper, we introduce bipartite graphs to capture the interactions between end hosts and domains, identify associated IP addresses of domains, and characterize time-series patterns of DNS queries for domains, and explore one-mode projections of these bipartite graphs for modeling the behavioral, IP-structural, and temporal similarities between domains. We employ graph embedding technique to automatically learn dynamic and discriminative feature representations for over 10,000 labeled domains, and develop an SVM-based classification algorithm for predicting malicious or benign domains. Our model makes the progress towards adapting to the changing and evolving strategies of malicious domains. The experimental results have shown that our proposed algorithm achieves an area under the curve (AUC) of 0.94 based on k-fold cross-validation. To the best of our knowledge, this is the first effort to apply the combination of behavioral modeling and graph embedding for effectively and accurately detecting malicious domains.

KW - Behavioral Modeling

KW - Graph Embedding

KW - Malicious Domain Detection

UR - http://www.scopus.com/inward/record.url?scp=85074821561&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85074821561&partnerID=8YFLogxK

U2 - 10.1109/ICDCS.2019.00066

DO - 10.1109/ICDCS.2019.00066

M3 - Conference contribution

AN - SCOPUS:85074821561

T3 - Proceedings - International Conference on Distributed Computing Systems

SP - 601

EP - 611

BT - Proceedings - 2019 39th IEEE International Conference on Distributed Computing Systems, ICDCS 2019

PB - Institute of Electrical and Electronics Engineers Inc.

T2 - 39th IEEE International Conference on Distributed Computing Systems, ICDCS 2019

Y2 - 7 July 2019 through 9 July 2019

ER -

Detecting malicious domains with behavioral modeling and graph embedding

Abstract

Publication series

Conference

Keywords

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this