TY - JOUR
T1 - Reasoning for Web document associations and its applications in site map construction
AU - Candan, K. Selçuk
AU - Li, Wen Syan
N1 - Funding Information:
The authors would like to express their appreciations to the Web site www-db. stanford . edu for its data used in their experiments. The reason why we have selected this Web site is due to (1) our familiarity with the contents of the site, which helps in evaluating the results and (2) the fact that the pages in this site are not dynamically generated. The experimental results presented in this paper are for the purposes of scientific research only. The authors would like to express their appreciations to Quoc Vu, Okan Kolak, Necip Fazil Ayan, and Hajime Takano for their discussion, comments, and contribution to the work on the logical domain identification. Kas ı m Sel c ̧ uk Candan is a tenure track Assistant Professor at the Department of Computer Science and Engineering at the Arizona State University. He joined the department in August 1997, after receiving his Ph.D. from the Computer Science Department at the University of Maryland at College Park. His dissertation research concentrated on multimedia document authoring, presentation, and retrieval in distributed collaborative environments. He received the 1997 ACM DC Chapter award of Samuel N. Alexander Fellowship for his Ph.D. work. His research interests include development of formal models, indexing schemes, and retrieval algorithms for multimedia and Web information and development of novel query optimization and processing algorithms. He has published various articles in respected journals and conferences in related areas. He received his B.S. degree, first ranked in the department, in computer science from Bilkent University in Turkey in 1993. Wen-Syan Li is a Senior Research Staff Member at Computers & Communications Research Laboratories (CCRL), NEC USA Inc. He received his Ph.D. in Computer Science from Northwestern University in December 1995. He also holds an MBA degree. His main research interests include content delivery network, multimedia/hypermedia/document databases, WWW, e-commerce, and information retrieval. He is leading CachePortal project at NEC USA Venture Development Center and Content Awareness Network project at NEC CCRL in San Jose. Wen-Syan is the recipient of the first NEC USA Achievement Award for his contributions in technology innovation.
PY - 2002/11
Y1 - 2002/11
N2 - Recently, there is an interest in using associations between Web pages in providing users with pages relevant to what they are currently viewing. We believe that, to enable intelligent decisions, we need to answer the question "for a given set of pages, find out why they are associated". We present a framework for reasoning about Web document associations. We start from the observation that the reasons of the Web page associations are implicitly embedded in the content of the pages as well as the links connecting them. The association reasoning scheme we propose is based on a random walk algorithm. This algorithm can take both link structure and contents into consideration and allows users to specify a focus. We then show how the proposed algorithm, combined with a logical domain identification technique, can be used for Web site summarization and Web site map construction to help users navigate through complex corporate sites. We see that, to achieve this goal, it is essential to recover the Web authors' intentions and superimpose it with the users' retrieval contexts in summarizing Web sites. Therefore, we present a framework, which uses logical neighborhoods, entry pages, and associations of entry pages, in creating context-sensitive summaries and maps of complex Web sites.
AB - Recently, there is an interest in using associations between Web pages in providing users with pages relevant to what they are currently viewing. We believe that, to enable intelligent decisions, we need to answer the question "for a given set of pages, find out why they are associated". We present a framework for reasoning about Web document associations. We start from the observation that the reasons of the Web page associations are implicitly embedded in the content of the pages as well as the links connecting them. The association reasoning scheme we propose is based on a random walk algorithm. This algorithm can take both link structure and contents into consideration and allows users to specify a focus. We then show how the proposed algorithm, combined with a logical domain identification technique, can be used for Web site summarization and Web site map construction to help users navigate through complex corporate sites. We see that, to achieve this goal, it is essential to recover the Web authors' intentions and superimpose it with the users' retrieval contexts in summarizing Web sites. Therefore, we present a framework, which uses logical neighborhoods, entry pages, and associations of entry pages, in creating context-sensitive summaries and maps of complex Web sites.
KW - Connectivity
KW - Link analysis
KW - Random walk
KW - Reasoning about associations
KW - Topic distillation
KW - WWW
UR - http://www.scopus.com/inward/record.url?scp=0036835935&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=0036835935&partnerID=8YFLogxK
U2 - 10.1016/S0169-023X(02)00053-8
DO - 10.1016/S0169-023X(02)00053-8
M3 - Article
AN - SCOPUS:0036835935
SN - 0169-023X
VL - 43
SP - 121
EP - 150
JO - Data and Knowledge Engineering
JF - Data and Knowledge Engineering
IS - 2
ER -