Abstract
The advent of e-commerce has created a trend that brought thousands of catalogs online. Most of these websites are "taxonomy-directed". A Web site is said to be "taxonomy-directed" if it contains at least one taxonomy for organizing its contents and it presents the instances belonging to a category in a regular fashion. This paper describes the DataRover system, which can automatically crawl and extract products from taxonomy-directed online catalogs. DataRover utilizes heuristic rules to discover the structural regularities among: taxonomy segments, list-of-product and single-product pages and it uses these regularities to turn the online catalogs into a database of categorized products without the need for user interaction or the wrapper maintenance burden. We provide experimental results to demonstrate the efficacy of the DataRover and point to its current limitations.
Original language | English (US) |
---|---|
Title of host publication | Proceedings of the Interntational Workshop on Web Information and Data Management |
Editors | R. Chiang, A.H.F. Laender, E.-P. Lim |
Pages | 9-14 |
Number of pages | 6 |
State | Published - 2003 |
Event | WIDM 2003: Proceedings of the Fifth ACM International Workshop on Web Information and Data Management - New Orleans, LA, United States Duration: Nov 7 2003 → Nov 8 2003 |
Other
Other | WIDM 2003: Proceedings of the Fifth ACM International Workshop on Web Information and Data Management |
---|---|
Country/Territory | United States |
City | New Orleans, LA |
Period | 11/7/03 → 11/8/03 |
Keywords
- Web Annotation
- Web Data Extraction
- Web Data Integration
ASJC Scopus subject areas
- Computer Networks and Communications
- Information Systems