Abstract
The increased popularity of standards for geospatial interoperability has led to an increasing number of geospatial Web services (GWSs), such as Web Map Services (WMSs), becoming publicly available on the Internet. However, finding the services in a quick and precise fashion is still a challenge. Traditional methods collect the services through centralized registries, where services can be manually registered. But the metadata of the registered services cannot be updated timely. This paper addresses the above challenges by developing an effective crawler to discover and update the services in (1) proposing an accumulated term frequency (ATF)-based conditional probability model for prioritized crawling, (2) utilizing concurrent multi-threading technique, and (3) adopting an automatic mechanism to update the metadata of identified services. Experiments show that the proposed crawler achieves good performance in both crawling efficiency and results' coverage/liveliness. In addition, an interesting finding regarding the distribution pattern of WMSs is discussed. We expect this research to contribute to automatic GWS discovery over the large-scale and dynamic World Wide Web and the promotion of operational interoperable distributed geospatial services.
Original language | English (US) |
---|---|
Pages (from-to) | 1127-1147 |
Number of pages | 21 |
Journal | International Journal of Geographical Information Science |
Volume | 24 |
Issue number | 8 |
DOIs | |
State | Published - Aug 1 2010 |
Externally published | Yes |
Keywords
- Accumulated term frequency (ATF)
- Clumped distribution
- Conditional probability
- Crawler
- Geospatial Web service (GWS)
- Web Map Service (WMS)
ASJC Scopus subject areas
- Information Systems
- Geography, Planning and Development
- Library and Information Sciences