TY - GEN
T1 - Detecting malicious domains with behavioral modeling and graph embedding
AU - Lei, Kai
AU - Fu, Qiuai
AU - Ni, Jiake
AU - Wang, Feiyang
AU - Yang, Min
AU - Xu, Kuai
N1 - Funding Information:
This work is supported by Shenzhen Fundamental Research Project (No. JCYJ20170412151008290) and Guangdong Natural Science Fund Project (Grant No. 2018A030313017). In addition, Kuai Xu is supported in part by the National Science Foundation under the grant CNS #1816995.
Funding Information:
Kai Lei†,‡, Qiuai Fu†,‡, Jiake Ni†, Feiyang Wang†, Min Yang¶, Kuai Xu§,∗ †ICNLAB, School of Electronics and Computer Engineering (SECE), Peking University, Shenzhen, China ‡PCL Research Center of Networks and Communications, Peng Cheng Laboratory, Shenzhen, China ¶Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen, China §School of Mathematical and Natural Sciences, Arizona State University †leik@pkusz.edu.cn, †{fuqiuai, jiake.ni, wangfy16}@pku.edu.cn, ¶min.yang@siat.ac.cn, §kuai.xu@asu.edu ∗Corresponding Author
Publisher Copyright:
© 2019 IEEE.
PY - 2019/7
Y1 - 2019/7
N2 - The last decade has witnessed the explosive growth of malicious Internet domains which serve as the fundamental infrastructure for establishing advanced persistent threat command and control communication channels or hosting phishing Web sites. Given the big data nature of Internet traffic data and the ability of algorithmically generating domains and acquiring and registering the domains in a near-automated fashion, detecting malicious domains in real-time is a daunting task for security analysts and network operators. In this paper, we introduce bipartite graphs to capture the interactions between end hosts and domains, identify associated IP addresses of domains, and characterize time-series patterns of DNS queries for domains, and explore one-mode projections of these bipartite graphs for modeling the behavioral, IP-structural, and temporal similarities between domains. We employ graph embedding technique to automatically learn dynamic and discriminative feature representations for over 10,000 labeled domains, and develop an SVM-based classification algorithm for predicting malicious or benign domains. Our model makes the progress towards adapting to the changing and evolving strategies of malicious domains. The experimental results have shown that our proposed algorithm achieves an area under the curve (AUC) of 0.94 based on k-fold cross-validation. To the best of our knowledge, this is the first effort to apply the combination of behavioral modeling and graph embedding for effectively and accurately detecting malicious domains.
AB - The last decade has witnessed the explosive growth of malicious Internet domains which serve as the fundamental infrastructure for establishing advanced persistent threat command and control communication channels or hosting phishing Web sites. Given the big data nature of Internet traffic data and the ability of algorithmically generating domains and acquiring and registering the domains in a near-automated fashion, detecting malicious domains in real-time is a daunting task for security analysts and network operators. In this paper, we introduce bipartite graphs to capture the interactions between end hosts and domains, identify associated IP addresses of domains, and characterize time-series patterns of DNS queries for domains, and explore one-mode projections of these bipartite graphs for modeling the behavioral, IP-structural, and temporal similarities between domains. We employ graph embedding technique to automatically learn dynamic and discriminative feature representations for over 10,000 labeled domains, and develop an SVM-based classification algorithm for predicting malicious or benign domains. Our model makes the progress towards adapting to the changing and evolving strategies of malicious domains. The experimental results have shown that our proposed algorithm achieves an area under the curve (AUC) of 0.94 based on k-fold cross-validation. To the best of our knowledge, this is the first effort to apply the combination of behavioral modeling and graph embedding for effectively and accurately detecting malicious domains.
KW - Behavioral Modeling
KW - Graph Embedding
KW - Malicious Domain Detection
UR - http://www.scopus.com/inward/record.url?scp=85074821561&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85074821561&partnerID=8YFLogxK
U2 - 10.1109/ICDCS.2019.00066
DO - 10.1109/ICDCS.2019.00066
M3 - Conference contribution
AN - SCOPUS:85074821561
T3 - Proceedings - International Conference on Distributed Computing Systems
SP - 601
EP - 611
BT - Proceedings - 2019 39th IEEE International Conference on Distributed Computing Systems, ICDCS 2019
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 39th IEEE International Conference on Distributed Computing Systems, ICDCS 2019
Y2 - 7 July 2019 through 9 July 2019
ER -