A distributed hierarchical clustering system for web mining

Catherine W. Wen, Huan Liu, Wilson X. Wen, Jeffery Zheng

Research output: Chapter in Book/Report/Conference proceedingConference contribution

12 Scopus citations


This paper proposes a novel method of distributed hierarchical clustering for Web mining. The method is closely related to our early work of SelfGenerated Neural Networks (SGNN), which is in turn based on both selforganizing neural network and concept formation. The complexity of the algorithm is at most O(MNlogN). With the distributed implementation the method can be easily scaled up. The method is independent of the order the web documents presented. The method produces a natural conceptual hierarchy but not a binary tree. The method can include multimedia information into the same cluster hierarchy. A visualization mechanism has been developed for the clustering method and it shows the cluster hierarchy generated by the method has very high quality. The clustering process is fully automatic, and no human intervention is required. A clustering system has been built based on the proposed method, which can be used to automatically generate multimedia search engines, web directories, decision-making assistance systems, knowledge management systems, and personalized knowledge portals.

Original languageEnglish (US)
Title of host publicationAdvances in Web-Age Information Management - 2nd International Conference, WAIM 2001, Proceedings
EditorsX. Sean Wang, Ge Yu, Hongjun Lu
PublisherSpringer Verlag
Number of pages11
ISBN (Print)9783540477143
StatePublished - Jan 1 2001
Event2nd International Conference on Web-Age Information Management, WAIM 2001 - Xi’an, China
Duration: Jul 9 2001Jul 11 2001

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349


Other2nd International Conference on Web-Age Information Management, WAIM 2001

ASJC Scopus subject areas

  • Theoretical Computer Science
  • Computer Science(all)


Dive into the research topics of 'A distributed hierarchical clustering system for web mining'. Together they form a unique fingerprint.

Cite this