Toward multidatabase mining: Identifying relevant databases

Huan Liu, Hongjun Lu, Jun Yao

Research output: Contribution to journalArticlepeer-review

40 Scopus citations


Various tools and systems for knowledge discovery and data mining are developed and available for applications. However, when we are immersed in heaps of databases, an immediate question is where we should start mining. It is not true that the more databases, the better for data mining. It is only true when the databases involved are relevant to a task at hand. In this paper, breaking away from the conventional data mining assumption that many databases be joined into one, we argue that the first step for multidatabase mining is to identify databases that are most likely relevant to an application; without doing so, the mining process can be lengthy, aimless, and ineffective. A measure of relevance is thus proposed for mining tasks with an objective of finding patterns or regularities about certain attributes. An efficient algorithm for identifying relevant databases is described. Experiments are conducted to verify the measure's performance and to exemplify its application.

Original languageEnglish (US)
Pages (from-to)541-553
Number of pages13
JournalIEEE Transactions on Knowledge and Data Engineering
Issue number4
StatePublished - Jul 2001


  • Data mining
  • Multiple databases
  • Query
  • Relevance measure

ASJC Scopus subject areas

  • Information Systems
  • Computer Science Applications
  • Computational Theory and Mathematics


Dive into the research topics of 'Toward multidatabase mining: Identifying relevant databases'. Together they form a unique fingerprint.

Cite this