PolarHub: A large-scale web crawling engine for OGC service discovery in cyberinfrastructure

WenWen Li, Sizhe Wang, Vidit Bhatia

Research output: Contribution to journalArticlepeer-review

19 Scopus citations


The advancement of geospatial interoperability research has fostered the proliferation of geospatial resources that are shared and made publicly available on the Web. However, their increasingly availability has made the identification of the web signature of voluminous geospatial resources a major challenge. In this paper, we introduce our solution of a new cyberinfrastructure platform, the PolarHub, that conducts large-scale web crawling to discover distributed geospatial data and service resources and accomplish this goal efficiently and effectively. The PolarHub is built-upon a service-oriented architecture (SOA) and adopts Data Access Object (DAO)-based software design pattern to ensure the extendibility of the software system. The proposed meta-search-based seed selection and pattern-matching based crawling strategy facilitates the rapid resource identification and discovery through constraining the search scope on the Web. In addition, PolarHub introduces the use of advanced asynchronous communication strategy, which combines client-pull and server-push to ensure high efficiency of the crawling system. These unique design features of PolarHub enable a high performance, scalable, sustainable, collaborative, and interactive platform for active geospatial data discovery. Because of OGC's widespread adoption, OGC-compliant web services become the primary search target of PolarHub. Currently, the PolarHub system is up and running and is serving various scientific community that demands geospatial data. We consider PolarHub a significant contribution to the field of information retrieval and geospatial interoperability.

Original languageEnglish (US)
Pages (from-to)195-207
Number of pages13
JournalComputers, Environment and Urban Systems
StatePublished - Sep 1 2016


  • Big data access
  • Cyberinfrastructure
  • Geospatial interoperability
  • PolarHub
  • Scalability

ASJC Scopus subject areas

  • Geography, Planning and Development
  • Ecological Modeling
  • General Environmental Science
  • Urban Studies


Dive into the research topics of 'PolarHub: A large-scale web crawling engine for OGC service discovery in cyberinfrastructure'. Together they form a unique fingerprint.

Cite this