TY - GEN
T1 - Bibfinder/statminer
T2 - 29th International Conference on Very Large Data Bases, VLDB 2003
AU - Nie, Zaiqing
AU - Kambhampati, Subbarao
AU - Hernandez, Thomas
N1 - Funding Information:
This research is supported in part by the NSF grant IRI-9801676 and the ASU ET-I3 initiative grant ECR A601. We thank Ullas Nambiar, Sreelakshmi Vaddi for comments as well as help in a previous implementation of Statminer, and Louiqa Raschid, Huan Liu, K. Selcuk Candan for many helpful critiques.
PY - 2003
Y1 - 2003
N2 - Recent work in data integration has shown the importance of statistical information about the coverage and overlap of sources for efficient query processing. Despite this recognition there are no effective approaches for learning the needed statistics. In this paper we present StatMiner, a system for estimating the coverage and overlap statistics while keeping the needed statistics tightly under control. StatMiner uses a hierarchical classification of the queries, and threshold based variants of familiar data mining techniques to dynamically decide the level of resolution at which to learn the statistics. We will demonstrate the major functionalities of StatMiner and the effectiveness of the learned statistics in BibFinder, a publicly available computer science bibliography mediator we developed. The sources that BibFinder integrates are autonomous and can have uncontrolled coverage and overlap. An important focus in BibFinder was thus to mine coverage and overlap statistics about these sources and to exploit them to improve query processing.
AB - Recent work in data integration has shown the importance of statistical information about the coverage and overlap of sources for efficient query processing. Despite this recognition there are no effective approaches for learning the needed statistics. In this paper we present StatMiner, a system for estimating the coverage and overlap statistics while keeping the needed statistics tightly under control. StatMiner uses a hierarchical classification of the queries, and threshold based variants of familiar data mining techniques to dynamically decide the level of resolution at which to learn the statistics. We will demonstrate the major functionalities of StatMiner and the effectiveness of the learned statistics in BibFinder, a publicly available computer science bibliography mediator we developed. The sources that BibFinder integrates are autonomous and can have uncontrolled coverage and overlap. An important focus in BibFinder was thus to mine coverage and overlap statistics about these sources and to exploit them to improve query processing.
UR - http://www.scopus.com/inward/record.url?scp=85012159850&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85012159850&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:85012159850
T3 - Proceedings - 29th International Conference on Very Large Data Bases, VLDB 2003
SP - 1097
EP - 1100
BT - Proceedings - 29th International Conference on Very Large Data Bases, VLDB 2003
A2 - Selinger, Patricia G.
A2 - Carey, Michael J.
A2 - Freytag, Johann Christoph
A2 - Abiteboul, Serge
A2 - Lockemann, Peter C.
A2 - Heuer, Andreas
PB - Morgan Kaufmann
Y2 - 9 September 2003 through 12 September 2003
ER -