Geospatial data management in apache spark: A tutorial

Jia Yu; Mohamed Sarwat

doi:10.1109/ICDE.2019.00239

Geospatial data management in apache spark: A tutorial

Jia Yu, Mohamed Sarwat

Computing and Augmented Intelligence, School of (IAFSE-SCAI)

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

2 Scopus citations

Abstract

The volume of spatial data increases at a staggering rate. This tutorial comprehensively studies how existing works extend Apache Spark to uphold massive-scale spatial data. During this 1.5 hour tutorial, we first provide a background introduction of the characteristics of spatial data and the history of distributed data management systems. A follow-up section presents the common approaches used by the practitioners to extend Spark and introduces the vital components in a generic spatial data management system. The third, fourth and fifth sections then discuss the ongoing efforts and experience in spatial-temporal data, spatial data analytics and streaming spatial data, respectively. The sixth part finally concludes this tutorial to help the audience better grasp the overall content and points out future research directions.

Original language	English (US)
Title of host publication	Proceedings - 2019 IEEE 35th International Conference on Data Engineering, ICDE 2019
Publisher	IEEE Computer Society
Pages	2060-2063
Number of pages	4
ISBN (Electronic)	9781538674741
DOIs	https://doi.org/10.1109/ICDE.2019.00239
State	Published - Apr 2019
Event	35th IEEE International Conference on Data Engineering, ICDE 2019 - Macau, China Duration: Apr 8 2019 → Apr 11 2019

Publication series

Name	Proceedings - International Conference on Data Engineering
Volume	2019-April
ISSN (Print)	1084-4627

Conference

Conference	35th IEEE International Conference on Data Engineering, ICDE 2019
Country/Territory	China
City	Macau
Period	4/8/19 → 4/11/19

Keywords

Apache spark
Distributed computing
Geospatial data

ASJC Scopus subject areas

Software
Signal Processing
Information Systems

Access to Document

10.1109/ICDE.2019.00239

Cite this

Geospatial data management in apache spark: A tutorial. / Yu, Jia; Sarwat, Mohamed.
Proceedings - 2019 IEEE 35th International Conference on Data Engineering, ICDE 2019. IEEE Computer Society, 2019. p. 2060-2063 8731372 (Proceedings - International Conference on Data Engineering; Vol. 2019-April).

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

Yu, J & Sarwat, M 2019, Geospatial data management in apache spark: A tutorial. in Proceedings - 2019 IEEE 35th International Conference on Data Engineering, ICDE 2019., 8731372, Proceedings - International Conference on Data Engineering, vol. 2019-April, IEEE Computer Society, pp. 2060-2063, 35th IEEE International Conference on Data Engineering, ICDE 2019, Macau, China, 4/8/19. https://doi.org/10.1109/ICDE.2019.00239

@inproceedings{3199f73246594999bfe0235da4619553,

title = "Geospatial data management in apache spark: A tutorial",

abstract = "The volume of spatial data increases at a staggering rate. This tutorial comprehensively studies how existing works extend Apache Spark to uphold massive-scale spatial data. During this 1.5 hour tutorial, we first provide a background introduction of the characteristics of spatial data and the history of distributed data management systems. A follow-up section presents the common approaches used by the practitioners to extend Spark and introduces the vital components in a generic spatial data management system. The third, fourth and fifth sections then discuss the ongoing efforts and experience in spatial-temporal data, spatial data analytics and streaming spatial data, respectively. The sixth part finally concludes this tutorial to help the audience better grasp the overall content and points out future research directions.",

keywords = "Apache spark, Distributed computing, Geospatial data",

author = "Jia Yu and Mohamed Sarwat",

note = "Funding Information: Mohamed Sarwat is an Assistant Professor of Computer Science and the director of the Data Systems lab at Arizona State University. Before joining ASU, Mohamed obtained his PhD degree in computer science from University of Minnesota in 2014. His research interest lies in the broad area of data management systems. Mohamed is a recipient of the University of Minnesota doctoral dissertation fellowship.His research work has been recognized by two best research paper awards in MDM 2015 and SSTD 2011 as well as a Best of Conference citation in ICDE 2012. He also received CCC Blue Sky Ideas award for best vision papers (3rd place) in SSTD 2017. Mohamed is an associate editor for the GeoInformatica journal and has served as a PC member for major data management and spatial computing venues. Publisher Copyright: {\textcopyright} 2019 IEEE. Copyright: Copyright 2019 Elsevier B.V., All rights reserved.; 35th IEEE International Conference on Data Engineering, ICDE 2019 ; Conference date: 08-04-2019 Through 11-04-2019",

year = "2019",

month = apr,

doi = "10.1109/ICDE.2019.00239",

language = "English (US)",

series = "Proceedings - International Conference on Data Engineering",

publisher = "IEEE Computer Society",

pages = "2060--2063",

booktitle = "Proceedings - 2019 IEEE 35th International Conference on Data Engineering, ICDE 2019",

}

TY - GEN

T1 - Geospatial data management in apache spark

T2 - 35th IEEE International Conference on Data Engineering, ICDE 2019

AU - Yu, Jia

AU - Sarwat, Mohamed

N1 - Funding Information: Mohamed Sarwat is an Assistant Professor of Computer Science and the director of the Data Systems lab at Arizona State University. Before joining ASU, Mohamed obtained his PhD degree in computer science from University of Minnesota in 2014. His research interest lies in the broad area of data management systems. Mohamed is a recipient of the University of Minnesota doctoral dissertation fellowship.His research work has been recognized by two best research paper awards in MDM 2015 and SSTD 2011 as well as a Best of Conference citation in ICDE 2012. He also received CCC Blue Sky Ideas award for best vision papers (3rd place) in SSTD 2017. Mohamed is an associate editor for the GeoInformatica journal and has served as a PC member for major data management and spatial computing venues. Publisher Copyright: © 2019 IEEE. Copyright: Copyright 2019 Elsevier B.V., All rights reserved.

PY - 2019/4

Y1 - 2019/4

N2 - The volume of spatial data increases at a staggering rate. This tutorial comprehensively studies how existing works extend Apache Spark to uphold massive-scale spatial data. During this 1.5 hour tutorial, we first provide a background introduction of the characteristics of spatial data and the history of distributed data management systems. A follow-up section presents the common approaches used by the practitioners to extend Spark and introduces the vital components in a generic spatial data management system. The third, fourth and fifth sections then discuss the ongoing efforts and experience in spatial-temporal data, spatial data analytics and streaming spatial data, respectively. The sixth part finally concludes this tutorial to help the audience better grasp the overall content and points out future research directions.

AB - The volume of spatial data increases at a staggering rate. This tutorial comprehensively studies how existing works extend Apache Spark to uphold massive-scale spatial data. During this 1.5 hour tutorial, we first provide a background introduction of the characteristics of spatial data and the history of distributed data management systems. A follow-up section presents the common approaches used by the practitioners to extend Spark and introduces the vital components in a generic spatial data management system. The third, fourth and fifth sections then discuss the ongoing efforts and experience in spatial-temporal data, spatial data analytics and streaming spatial data, respectively. The sixth part finally concludes this tutorial to help the audience better grasp the overall content and points out future research directions.

KW - Apache spark

KW - Distributed computing

KW - Geospatial data

UR - http://www.scopus.com/inward/record.url?scp=85067923448&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85067923448&partnerID=8YFLogxK

U2 - 10.1109/ICDE.2019.00239

DO - 10.1109/ICDE.2019.00239

M3 - Conference contribution

AN - SCOPUS:85067923448

T3 - Proceedings - International Conference on Data Engineering

SP - 2060

EP - 2063

BT - Proceedings - 2019 IEEE 35th International Conference on Data Engineering, ICDE 2019

PB - IEEE Computer Society

Y2 - 8 April 2019 through 11 April 2019

ER -

Geospatial data management in apache spark: A tutorial

Abstract

Publication series

Conference

Keywords

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this