GeoSpark: A cluster computing framework for processing large-scale spatial data

Jia Yu, Jinxuan Wu, Mohamed Elsayed

Research output: Chapter in Book/Report/Conference proceedingConference contribution

254 Scopus citations

Abstract

This paper introduces GeoSpark an in-memory cluster computing framework for processing large-scale spatial data. GeoSpark consists of three layers: Apache Spark Layer, Spatial RDD Layer and Spatial Query Processing Layer. Apache Spark Layer provides basic Spark functionalities that include loading/storing data to disk as well as regular RDD operations. Spatial RDD Layer consists of three novel Spatial Resilient Distributed Datasets (SRDDs) which extend regular Apache Spark RDDs to support geometrical and spatial objects. GeoSpark provides a geometrical operations library that accesses Spatial RDDs to perform basic geometrical operations (e.g., Overlap, Intersect). System users can leverage the newly defined SRDDs to effectively develop spatial data processing programs in Spark. The Spatial Query Processing Layer efficiently executes spatial query processing algorithms (e.g., Spatial Range, Join, KNN query) on SRDDs. GeoSpark also allows users to create a spatial index (e.g., R-tree, Quad-tree ) that boosts spatial data processing performance in each SRDD partition. Preliminary experiments show that GeoSpark achieves better run time performance than its Hadoop-based counterparts (e.g., SpatialHadoop).

Original languageEnglish (US)
Title of host publicationGIS: Proceedings of the ACM International Symposium on Advances in Geographic Information Systems
PublisherAssociation for Computing Machinery
Volume03-06-November-2015
ISBN (Print)9781450339674
DOIs
StatePublished - Nov 3 2015
Event23rd ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems, ACM SIGSPATIAL GIS 2015 - Seattle, United States
Duration: Nov 3 2015Nov 6 2015

Other

Other23rd ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems, ACM SIGSPATIAL GIS 2015
Country/TerritoryUnited States
CitySeattle
Period11/3/1511/6/15

Keywords

  • Cluster computing
  • Large-scale data
  • Spatial data

ASJC Scopus subject areas

  • Earth-Surface Processes
  • Computer Science Applications
  • Modeling and Simulation
  • Computer Graphics and Computer-Aided Design
  • Information Systems

Fingerprint

Dive into the research topics of 'GeoSpark: A cluster computing framework for processing large-scale spatial data'. Together they form a unique fingerprint.

Cite this