Organization and tagging of blog and news entries based on content reuse

Jong Wook Kim, Kasim Candan, Junichi Tatemura

Research output: Contribution to journalArticlepeer-review

6 Scopus citations


As their popularity as dynamic platforms for information dissemination and sharing increases, the use of Weblogs (blogs) which track and comment on real world (political, news, entertainment) events is also growing. The success of the blog as a popular medium for information sharing, on the other hand, is also its weakest spot in that there is little support beyond keyword based searches for blog entries. Consequently, there is impending need for navigational support, which can help users relate a large, diverse, and inherently distributed collection of blogosphere. In this paper, we first note that the existence of large degrees of content overlaps in the form of quotation/commentary pairs (as well as content borrowings across media outlets) can be leveraged for tracking the topic development patterns within the blogosphere. Relying on this observation, we first propose focus and flow analysis techniques that rely on reuse detection and focus and flow to help place blog entries into logical organizations. We then show that these implicit or explicit quotations as well as focus analysis could be leveraged to identify the contexts in which entries occur; thus, resulting in more effective tagging. Thus, we propose CDIP (a collection-driven, yet individuality-preserving tagging system) which relies on relationships provided by quotation/reuse detection and semantic-focus analysis to automatically tag the blogs in such a way that, not-only the related blogs share tags, but also individuality of the entries is preserved for discriminating tag-based accesses.

Original languageEnglish (US)
Pages (from-to)407-421
Number of pages15
JournalJournal of Signal Processing Systems
Issue number3
StatePublished - Mar 2010


  • Navigational support
  • Tagging
  • Topic development patterns
  • Weblogs

ASJC Scopus subject areas

  • Control and Systems Engineering
  • Theoretical Computer Science
  • Signal Processing
  • Information Systems
  • Modeling and Simulation
  • Hardware and Architecture


Dive into the research topics of 'Organization and tagging of blog and news entries based on content reuse'. Together they form a unique fingerprint.

Cite this