Abstract
An XML publish/subscribe system needs to filter a large number of queries over XML streams. Most existing systems only consider filtering the simple XPath statements. In this paper, we focus on filtering of the more complex Generalized-Tree-Pattern (GTP) queries. Our filtering mechanism is based on a novel Tree-of-Path (TOP) encoding scheme, which compactly represents the path matches for the entire document. First, we show that the TOP encodings can be efficiently produced via shared bottom-up path matching. Second, with the aid of this TOP encoding, we can 1) achieve polynomial time and space complexity for postprocessing, 2) avoid redundant predicate evaluations, 3) allow an efficient duplicate-free and merge join-based algorithm for merging multiple encoded path matches, and 4) simplify the processing of GTP queries. Overall, our approach maximizes the sharing opportunity across queries by exploiting the suffix as well as prefix sharing. At the same time, our TOP encodings allow efficient postprocessing for GTP queries. Extensive performance studies show that GFilter not only achieves significantly better filtering performance than state-of-the-art algorithms but also is capable of efficiently filtering the more complex GTP queries.
Original language | English (US) |
---|---|
Article number | 4509431 |
Pages (from-to) | 1627-1640 |
Number of pages | 14 |
Journal | IEEE Transactions on Knowledge and Data Engineering |
Volume | 20 |
Issue number | 12 |
DOIs | |
State | Published - Dec 2008 |
Externally published | Yes |
Keywords
- Generalized-tree-pattern queries
- Result encoding
- XML filtering
- XML streams
ASJC Scopus subject areas
- Information Systems
- Computer Science Applications
- Computational Theory and Mathematics