AFilter: Adaptable XML filtering with prefix-caching and suffix-clustering

K. Selçuk Candan; Wang Pin Hsiung; Songting Chen; Junichi Tatemura; Divyakant Agrawal

AFilter: Adaptable XML filtering with prefix-caching and suffix-clustering

K. Selçuk Candan, Wang Pin Hsiung, Songting Chen, Junichi Tatemura, Divyakant Agrawal

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

Abstract

XML message filtering problem involves searching for instances of a given, potentially large, set of patterns in a continuous stream of XML messages. Since the messages arrive continuously, it is essential that the filtering rate matches the data arrival rate. Therefore, the given set of filter patterns needs to be indexed appropriately to enable real-time processing of the streaming XML data. In this paper, we propose AFilter, an adaptable, and thus scalable, path expression filtering approach. AFilter has a base memory requirement linear in filter expression and data size. Furthermore, when additional memory is available, AFilter can exploit prefix commonalities in the set of filter expressions using a loosely-coupled prefix caching mechanism as opposed to tightly-coupled active state representation of alternative approaches. Unlike existing systems, AFilter can also exploit suffix-commonalities across filter expressions, while simultaneously leveraging the prefix-commonalities through the cache. Finally, AFilter uses a triggering mechanism to prevent excessive consumption of resources by delaying processing until a trigger condition is observed. Experiment results show that AFilter provides significantly better scalability and runtime performance when compared to state of the art filtering systems.

Original language	English (US)
Title of host publication	VLDB 2006 - Proceedings of the 32nd International Conference on Very Large Data Bases
Publisher	Association for Computing Machinery
Pages	559-570
Number of pages	12
ISBN (Print)	1595933859, 9781595933850
State	Published - 2006
Externally published	Yes
Event	32nd International Conference on Very Large Data Bases, VLDB 2006 - Seoul, Korea, Republic of Duration: Sep 12 2006 → Sep 15 2006

Publication series

Name	VLDB 2006 - Proceedings of the 32nd International Conference on Very Large Data Bases

Other

Other	32nd International Conference on Very Large Data Bases, VLDB 2006
Country/Territory	Korea, Republic of
City	Seoul
Period	9/12/06 → 9/15/06

ASJC Scopus subject areas

Information Systems and Management
Hardware and Architecture
Information Systems
Software

Cite this

Candan, K. S., Hsiung, W. P., Chen, S., Tatemura, J., & Agrawal, D. (2006). AFilter: Adaptable XML filtering with prefix-caching and suffix-clustering. In VLDB 2006 - Proceedings of the 32nd International Conference on Very Large Data Bases (pp. 559-570). (VLDB 2006 - Proceedings of the 32nd International Conference on Very Large Data Bases). Association for Computing Machinery.

AFilter: Adaptable XML filtering with prefix-caching and suffix-clustering. / Candan, K. Selçuk; Hsiung, Wang Pin; Chen, Songting et al.
VLDB 2006 - Proceedings of the 32nd International Conference on Very Large Data Bases. Association for Computing Machinery, 2006. p. 559-570 (VLDB 2006 - Proceedings of the 32nd International Conference on Very Large Data Bases).

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

Candan, KS, Hsiung, WP, Chen, S, Tatemura, J & Agrawal, D 2006, AFilter: Adaptable XML filtering with prefix-caching and suffix-clustering. in VLDB 2006 - Proceedings of the 32nd International Conference on Very Large Data Bases. VLDB 2006 - Proceedings of the 32nd International Conference on Very Large Data Bases, Association for Computing Machinery, pp. 559-570, 32nd International Conference on Very Large Data Bases, VLDB 2006, Seoul, Korea, Republic of, 9/12/06.

Candan, K. Selçuk ; Hsiung, Wang Pin ; Chen, Songting et al. / AFilter : Adaptable XML filtering with prefix-caching and suffix-clustering. VLDB 2006 - Proceedings of the 32nd International Conference on Very Large Data Bases. Association for Computing Machinery, 2006. pp. 559-570 (VLDB 2006 - Proceedings of the 32nd International Conference on Very Large Data Bases).

@inproceedings{ad86941988c8402a9c82d87c15bb65f1,

title = "AFilter: Adaptable XML filtering with prefix-caching and suffix-clustering",

abstract = "XML message filtering problem involves searching for instances of a given, potentially large, set of patterns in a continuous stream of XML messages. Since the messages arrive continuously, it is essential that the filtering rate matches the data arrival rate. Therefore, the given set of filter patterns needs to be indexed appropriately to enable real-time processing of the streaming XML data. In this paper, we propose AFilter, an adaptable, and thus scalable, path expression filtering approach. AFilter has a base memory requirement linear in filter expression and data size. Furthermore, when additional memory is available, AFilter can exploit prefix commonalities in the set of filter expressions using a loosely-coupled prefix caching mechanism as opposed to tightly-coupled active state representation of alternative approaches. Unlike existing systems, AFilter can also exploit suffix-commonalities across filter expressions, while simultaneously leveraging the prefix-commonalities through the cache. Finally, AFilter uses a triggering mechanism to prevent excessive consumption of resources by delaying processing until a trigger condition is observed. Experiment results show that AFilter provides significantly better scalability and runtime performance when compared to state of the art filtering systems.",

author = "Candan, {K. Sel{\c c}uk} and Hsiung, {Wang Pin} and Songting Chen and Junichi Tatemura and Divyakant Agrawal",

year = "2006",

language = "English (US)",

isbn = "1595933859",

series = "VLDB 2006 - Proceedings of the 32nd International Conference on Very Large Data Bases",

publisher = "Association for Computing Machinery",

pages = "559--570",

booktitle = "VLDB 2006 - Proceedings of the 32nd International Conference on Very Large Data Bases",

note = "32nd International Conference on Very Large Data Bases, VLDB 2006 ; Conference date: 12-09-2006 Through 15-09-2006",

}

TY - GEN

T1 - AFilter

T2 - 32nd International Conference on Very Large Data Bases, VLDB 2006

AU - Candan, K. Selçuk

AU - Hsiung, Wang Pin

AU - Chen, Songting

AU - Tatemura, Junichi

AU - Agrawal, Divyakant

PY - 2006

Y1 - 2006

N2 - XML message filtering problem involves searching for instances of a given, potentially large, set of patterns in a continuous stream of XML messages. Since the messages arrive continuously, it is essential that the filtering rate matches the data arrival rate. Therefore, the given set of filter patterns needs to be indexed appropriately to enable real-time processing of the streaming XML data. In this paper, we propose AFilter, an adaptable, and thus scalable, path expression filtering approach. AFilter has a base memory requirement linear in filter expression and data size. Furthermore, when additional memory is available, AFilter can exploit prefix commonalities in the set of filter expressions using a loosely-coupled prefix caching mechanism as opposed to tightly-coupled active state representation of alternative approaches. Unlike existing systems, AFilter can also exploit suffix-commonalities across filter expressions, while simultaneously leveraging the prefix-commonalities through the cache. Finally, AFilter uses a triggering mechanism to prevent excessive consumption of resources by delaying processing until a trigger condition is observed. Experiment results show that AFilter provides significantly better scalability and runtime performance when compared to state of the art filtering systems.

AB - XML message filtering problem involves searching for instances of a given, potentially large, set of patterns in a continuous stream of XML messages. Since the messages arrive continuously, it is essential that the filtering rate matches the data arrival rate. Therefore, the given set of filter patterns needs to be indexed appropriately to enable real-time processing of the streaming XML data. In this paper, we propose AFilter, an adaptable, and thus scalable, path expression filtering approach. AFilter has a base memory requirement linear in filter expression and data size. Furthermore, when additional memory is available, AFilter can exploit prefix commonalities in the set of filter expressions using a loosely-coupled prefix caching mechanism as opposed to tightly-coupled active state representation of alternative approaches. Unlike existing systems, AFilter can also exploit suffix-commonalities across filter expressions, while simultaneously leveraging the prefix-commonalities through the cache. Finally, AFilter uses a triggering mechanism to prevent excessive consumption of resources by delaying processing until a trigger condition is observed. Experiment results show that AFilter provides significantly better scalability and runtime performance when compared to state of the art filtering systems.

UR - http://www.scopus.com/inward/record.url?scp=84893844864&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84893844864&partnerID=8YFLogxK

M3 - Conference contribution

AN - SCOPUS:84893844864

SN - 1595933859

SN - 9781595933850

T3 - VLDB 2006 - Proceedings of the 32nd International Conference on Very Large Data Bases

SP - 559

EP - 570

BT - VLDB 2006 - Proceedings of the 32nd International Conference on Very Large Data Bases

PB - Association for Computing Machinery

Y2 - 12 September 2006 through 15 September 2006

ER -

AFilter: Adaptable XML filtering with prefix-caching and suffix-clustering

Abstract

Publication series

Other

ASJC Scopus subject areas

Other files and links

Fingerprint

Cite this