FIRST: Fast interactive attributed subgraph matching

Boxin Du; Si Zhang; Nan Cao; Hanghang Tong

doi:10.1145/3097983.3098040

FIRST: Fast interactive attributed subgraph matching

Boxin Du, Si Zhang, Nan Cao, Hanghang Tong

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

40 Scopus citations

Abstract

Attributed subgraph matching is a powerful tool for explorative mining of large attributed networks. In many applications (e.g., network science of teams, intelligence analysis, finance informatics), the user might not know what exactly s/he is looking for, and thus require the user to constantly revise the initial query graph based on what s/he finds from the current matching results. A major bottleneck in such an interactive matching scenario is the efficiency, as simply rerunning the matching algorithm on the revised query graph is computationally prohibitive. In this paper, we propose a family of effective and efficient algorithms (FIRST) to support interactive attributed subgraph matching. There are two key ideas behind the proposed methods. The first is to recast the attributed subgraph matching problem as a cross-network node similarity problem, whose major computation lies in solving a Sylvester equation for the query graph and the underlying data graph. The second key idea is to explore the smoothness between the initial and revised queries, which allows us to solve the new/updated Sylvester equation incrementally, without re-solving it from scratch. Experimental results show that our method can achieve (1) up to 16x speed-up when applying on networks with 6M+ nodes; (2) preserving more than 90% accuracy compared with existing methods; and (3) scales linearly with respect to the size of the data graph.

Original language	English (US)
Title of host publication	KDD 2017 - Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
Publisher	Association for Computing Machinery
Pages	1447-1456
Number of pages	10
ISBN (Electronic)	9781450348874
DOIs	https://doi.org/10.1145/3097983.3098040
State	Published - Aug 13 2017
Event	23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2017 - Halifax, Canada Duration: Aug 13 2017 → Aug 17 2017

Publication series

Name	Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
Volume	Part F129685

Other

Other	23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2017
Country/Territory	Canada
City	Halifax
Period	8/13/17 → 8/17/17

Keywords

Cross-network similarity
Inexact matching
Interactive attributed subgraph matching

ASJC Scopus subject areas

Software
Information Systems

Access to Document

10.1145/3097983.3098040

Cite this

Du, B., Zhang, S., Cao, N., & Tong, H. (2017). FIRST: Fast interactive attributed subgraph matching. In KDD 2017 - Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 1447-1456). (Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining; Vol. Part F129685). Association for Computing Machinery. https://doi.org/10.1145/3097983.3098040

FIRST: Fast interactive attributed subgraph matching. / Du, Boxin; Zhang, Si; Cao, Nan et al.
KDD 2017 - Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Association for Computing Machinery, 2017. p. 1447-1456 (Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining; Vol. Part F129685).

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

Du, B, Zhang, S, Cao, N & Tong, H 2017, FIRST: Fast interactive attributed subgraph matching. in KDD 2017 - Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, vol. Part F129685, Association for Computing Machinery, pp. 1447-1456, 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2017, Halifax, Canada, 8/13/17. https://doi.org/10.1145/3097983.3098040

Du B, Zhang S, Cao N, Tong H. FIRST: Fast interactive attributed subgraph matching. In KDD 2017 - Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Association for Computing Machinery. 2017. p. 1447-1456. (Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining). doi: 10.1145/3097983.3098040

@inproceedings{363e52059924480e8ab4dcc97313aa84,

title = "FIRST: Fast interactive attributed subgraph matching",

abstract = "Attributed subgraph matching is a powerful tool for explorative mining of large attributed networks. In many applications (e.g., network science of teams, intelligence analysis, finance informatics), the user might not know what exactly s/he is looking for, and thus require the user to constantly revise the initial query graph based on what s/he finds from the current matching results. A major bottleneck in such an interactive matching scenario is the efficiency, as simply rerunning the matching algorithm on the revised query graph is computationally prohibitive. In this paper, we propose a family of effective and efficient algorithms (FIRST) to support interactive attributed subgraph matching. There are two key ideas behind the proposed methods. The first is to recast the attributed subgraph matching problem as a cross-network node similarity problem, whose major computation lies in solving a Sylvester equation for the query graph and the underlying data graph. The second key idea is to explore the smoothness between the initial and revised queries, which allows us to solve the new/updated Sylvester equation incrementally, without re-solving it from scratch. Experimental results show that our method can achieve (1) up to 16x speed-up when applying on networks with 6M+ nodes; (2) preserving more than 90% accuracy compared with existing methods; and (3) scales linearly with respect to the size of the data graph.",

keywords = "Cross-network similarity, Inexact matching, Interactive attributed subgraph matching",

author = "Boxin Du and Si Zhang and Nan Cao and Hanghang Tong",

note = "Funding Information: In this paper, we study the interactive attributed subgraph matching problem and develop a family of e cient and effective algorithms (FIRST) to address this problem according to different interactive scenarios. Specifically, we first propose that the problem can be recasted to a cross-netwrok node similarity problem and the computation can be speeded up by exploring the smoothness between initial and revised queries. We then propose FIRST-Q and FIRST-N to handle the scenario where only node attribute is available, and FIRST-E to handle the scenario where both node and edge attribute are available. We conduct numerous experiments on real world data, and show that our method lead up to 16× speedup with more than 90% accuracy. In the future, we will (i) deploy the proposed FIRST algorithms in an online team search and optimization system (http://team-net-work.org/system.html), and (ii) generalize it to handle dynamic attributed data networks and deploy it. 7 ACKNOWLEDGEMENTS This work is supported by National Science Foundation under Grant No. IIS-1651203, DTRA under the grant number HDTRA1-16-0017, Army Research O ce under the contract number W911NF-16-1-0168, National Institutes of Health under the grant number R01LM011986, Region II University Transportation Center under the project number 49997-33 25, National Natural Science Foundation of China under Grant No. 61602306, IBM 2016 SUR Award, and a Baidu gift. We would also like to genuinely thank Dr. Yutao Zhang and Dr. Jie Tang for sharing the dataset, and all the reviewers for providing helpful criticism and valuable comments. Publisher Copyright: {\textcopyright} 2017 ACM.; 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2017 ; Conference date: 13-08-2017 Through 17-08-2017",

year = "2017",

month = aug,

day = "13",

doi = "10.1145/3097983.3098040",

language = "English (US)",

series = "Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining",

publisher = "Association for Computing Machinery",

pages = "1447--1456",

booktitle = "KDD 2017 - Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining",

}

TY - GEN

T1 - FIRST

T2 - 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2017

AU - Du, Boxin

AU - Zhang, Si

AU - Cao, Nan

AU - Tong, Hanghang

N1 - Funding Information: In this paper, we study the interactive attributed subgraph matching problem and develop a family of e cient and effective algorithms (FIRST) to address this problem according to different interactive scenarios. Specifically, we first propose that the problem can be recasted to a cross-netwrok node similarity problem and the computation can be speeded up by exploring the smoothness between initial and revised queries. We then propose FIRST-Q and FIRST-N to handle the scenario where only node attribute is available, and FIRST-E to handle the scenario where both node and edge attribute are available. We conduct numerous experiments on real world data, and show that our method lead up to 16× speedup with more than 90% accuracy. In the future, we will (i) deploy the proposed FIRST algorithms in an online team search and optimization system (http://team-net-work.org/system.html), and (ii) generalize it to handle dynamic attributed data networks and deploy it. 7 ACKNOWLEDGEMENTS This work is supported by National Science Foundation under Grant No. IIS-1651203, DTRA under the grant number HDTRA1-16-0017, Army Research O ce under the contract number W911NF-16-1-0168, National Institutes of Health under the grant number R01LM011986, Region II University Transportation Center under the project number 49997-33 25, National Natural Science Foundation of China under Grant No. 61602306, IBM 2016 SUR Award, and a Baidu gift. We would also like to genuinely thank Dr. Yutao Zhang and Dr. Jie Tang for sharing the dataset, and all the reviewers for providing helpful criticism and valuable comments. Publisher Copyright: © 2017 ACM.

PY - 2017/8/13

Y1 - 2017/8/13

N2 - Attributed subgraph matching is a powerful tool for explorative mining of large attributed networks. In many applications (e.g., network science of teams, intelligence analysis, finance informatics), the user might not know what exactly s/he is looking for, and thus require the user to constantly revise the initial query graph based on what s/he finds from the current matching results. A major bottleneck in such an interactive matching scenario is the efficiency, as simply rerunning the matching algorithm on the revised query graph is computationally prohibitive. In this paper, we propose a family of effective and efficient algorithms (FIRST) to support interactive attributed subgraph matching. There are two key ideas behind the proposed methods. The first is to recast the attributed subgraph matching problem as a cross-network node similarity problem, whose major computation lies in solving a Sylvester equation for the query graph and the underlying data graph. The second key idea is to explore the smoothness between the initial and revised queries, which allows us to solve the new/updated Sylvester equation incrementally, without re-solving it from scratch. Experimental results show that our method can achieve (1) up to 16x speed-up when applying on networks with 6M+ nodes; (2) preserving more than 90% accuracy compared with existing methods; and (3) scales linearly with respect to the size of the data graph.

AB - Attributed subgraph matching is a powerful tool for explorative mining of large attributed networks. In many applications (e.g., network science of teams, intelligence analysis, finance informatics), the user might not know what exactly s/he is looking for, and thus require the user to constantly revise the initial query graph based on what s/he finds from the current matching results. A major bottleneck in such an interactive matching scenario is the efficiency, as simply rerunning the matching algorithm on the revised query graph is computationally prohibitive. In this paper, we propose a family of effective and efficient algorithms (FIRST) to support interactive attributed subgraph matching. There are two key ideas behind the proposed methods. The first is to recast the attributed subgraph matching problem as a cross-network node similarity problem, whose major computation lies in solving a Sylvester equation for the query graph and the underlying data graph. The second key idea is to explore the smoothness between the initial and revised queries, which allows us to solve the new/updated Sylvester equation incrementally, without re-solving it from scratch. Experimental results show that our method can achieve (1) up to 16x speed-up when applying on networks with 6M+ nodes; (2) preserving more than 90% accuracy compared with existing methods; and (3) scales linearly with respect to the size of the data graph.

KW - Cross-network similarity

KW - Inexact matching

KW - Interactive attributed subgraph matching

UR - http://www.scopus.com/inward/record.url?scp=85029046778&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85029046778&partnerID=8YFLogxK

U2 - 10.1145/3097983.3098040

DO - 10.1145/3097983.3098040

M3 - Conference contribution

AN - SCOPUS:85029046778

T3 - Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

SP - 1447

EP - 1456

BT - KDD 2017 - Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

PB - Association for Computing Machinery

Y2 - 13 August 2017 through 17 August 2017

ER -

FIRST: Fast interactive attributed subgraph matching

Abstract

Publication series

Other

Keywords

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this