GeoSparkViz: a cluster computing system for visualizing massive-scale geospatial data

Research output: Contribution to journalArticlepeer-review

11 Scopus citations


In the last decade, geospatial data which is extracted from GPS traces and satellites image has become ubiquitous. GeoVisual analytics, abbr. GeoViz, is the science of analytical reasoning assisted by geospatial map interfaces. GeoViz involves two phases: (1) spatial data processing: that loads spatial data and executes spatial queries to return the set of spatial objects to be visualized. (2) Map visualization: that applies a map visualization effect, e.g., Heatmap, on the spatial objects produced in the first phase. Existing GeoViz system architectures decouple these two phases, which lose the opportunity to co-optimize the data processing and map visualization phases in the same cluster. To remedy this, the paper presents GeoSparkViz, a full-fledged system that allows the user to load, process, integrate and execute GeoViz tasks on spatial data at scale. GeoSparkViz extends a state-of-the-art distributed data management system to provide native support for general geospatial map visualization. The system encapsulates the main steps of the map visualization process, e.g., pixelize spatial objects, pixel aggregation, and map tile rendering into a set of massively parallelized map building operators. This allows the system to co-optimize the spatial query operators and map building operators side by side. GeoSparkViz is also equipped with a GeoViz-aware spatial partitioning operator that achieves load balancing for GeoViz workloads among all nodes in the cluster. Experiments based on an implementation in Spark show that GeoSparkViz achieves up to an order of magnitude less data-to-visualization time than its counterparts when running visual analytics tasks over large-scale spatial data extracted from the NYC taxi dataset and OpenStreetMaps.

Original languageEnglish (US)
Pages (from-to)237-258
Number of pages22
JournalVLDB Journal
Issue number2
StatePublished - Mar 2021


  • Big spatial data
  • Distributed computation
  • Geospatial visualization

ASJC Scopus subject areas

  • Information Systems
  • Hardware and Architecture


Dive into the research topics of 'GeoSparkViz: a cluster computing system for visualizing massive-scale geospatial data'. Together they form a unique fingerprint.

Cite this