Bibfinder/statminer: Effectively mining and using coverage and overlap statistics in data integration

Zaiqing Nie, Subbarao Kambhampati, Thomas Hernandez

Research output: Chapter in Book/Report/Conference proceedingConference contribution

15 Scopus citations

Abstract

Recent work in data integration has shown the importance of statistical information about the coverage and overlap of sources for efficient query processing. Despite this recognition there are no effective approaches for learning the needed statistics. In this paper we present StatMiner, a system for estimating the coverage and overlap statistics while keeping the needed statistics tightly under control. StatMiner uses a hierarchical classification of the queries, and threshold based variants of familiar data mining techniques to dynamically decide the level of resolution at which to learn the statistics. We will demonstrate the major functionalities of StatMiner and the effectiveness of the learned statistics in BibFinder, a publicly available computer science bibliography mediator we developed. The sources that BibFinder integrates are autonomous and can have uncontrolled coverage and overlap. An important focus in BibFinder was thus to mine coverage and overlap statistics about these sources and to exploit them to improve query processing.

Original languageEnglish (US)
Title of host publicationProceedings - 29th International Conference on Very Large Data Bases, VLDB 2003
EditorsPatricia G. Selinger, Michael J. Carey, Johann Christoph Freytag, Serge Abiteboul, Peter C. Lockemann, Andreas Heuer
PublisherMorgan Kaufmann
Pages1097-1100
Number of pages4
ISBN (Electronic)0127224424, 9780127224428
StatePublished - 2003
Event29th International Conference on Very Large Data Bases, VLDB 2003 - Berlin, Germany
Duration: Sep 9 2003Sep 12 2003

Publication series

NameProceedings - 29th International Conference on Very Large Data Bases, VLDB 2003

Other

Other29th International Conference on Very Large Data Bases, VLDB 2003
Country/TerritoryGermany
CityBerlin
Period9/9/039/12/03

ASJC Scopus subject areas

  • Software
  • Information Systems
  • Hardware and Architecture
  • Information Systems and Management
  • Computer Science Applications
  • Computer Networks and Communications

Fingerprint

Dive into the research topics of 'Bibfinder/statminer: Effectively mining and using coverage and overlap statistics in data integration'. Together they form a unique fingerprint.

Cite this