Similarity join for big geographic data

Yasin N. Silva; Jason M. Reed; Lisa M. Tsosie; Timothy A. Matti

doi:10.1201/b16871

Similarity join for big geographic data

Yasin N. Silva, Jason M. Reed, Lisa M. Tsosie, Timothy A. Matti

Mathematical and Natural Sciences, School of (SMNS)

Research output: Chapter in Book/Report/Conference proceeding › Chapter

Abstract

Similarity Join is one of the most useful data processing and analysis operations for geographic data. It retrieves all data pairs whose distances are smaller than a predefi ned threshold e. Multiple application scenarios need to perform this operation over large amounts of data. Internet companies, for instance, collect massive amounts of information on their customers such as their geographic location and interests. They can use similarity queries to provide enhanced services to their customers; for example, a movie theatre website could recommend neighboring theatres and restaurants in the customer’s town. MapReduce, a framework for processing very large datasets using large computer clusters, constitutes an answer to the requirements of processing massive amounts of data in a highly scalable and distributed fashion (Dean and Ghemawat 2004). MapReduce-based systems are composed of large clusters of commodity machines and are often dynamically scalable, i.e., cluster nodes can be added or removed based on the workload. The MapReduce framework quickly processes massive datasets by splitting them into independent chunks that are processed in a highly parallel fashion.

Original language	English (US)
Title of host publication	Geographical Information Systems
Subtitle of host publication	Trends and Technologies
Publisher	CRC Press
Pages	20-49
Number of pages	30
ISBN (Electronic)	9781466596955
ISBN (Print)	9781466596931
DOIs	https://doi.org/10.1201/b16871
State	Published - Jan 1 2014

ASJC Scopus subject areas

General Computer Science
General Earth and Planetary Sciences
General Engineering

Access to Document

10.1201/b16871

Cite this

@inbook{64f13ff369d14a62a10bc24337b848e3,

title = "Similarity join for big geographic data",

abstract = "Similarity Join is one of the most useful data processing and analysis operations for geographic data. It retrieves all data pairs whose distances are smaller than a predefi ned threshold e. Multiple application scenarios need to perform this operation over large amounts of data. Internet companies, for instance, collect massive amounts of information on their customers such as their geographic location and interests. They can use similarity queries to provide enhanced services to their customers; for example, a movie theatre website could recommend neighboring theatres and restaurants in the customer{\textquoteright}s town. MapReduce, a framework for processing very large datasets using large computer clusters, constitutes an answer to the requirements of processing massive amounts of data in a highly scalable and distributed fashion (Dean and Ghemawat 2004). MapReduce-based systems are composed of large clusters of commodity machines and are often dynamically scalable, i.e., cluster nodes can be added or removed based on the workload. The MapReduce framework quickly processes massive datasets by splitting them into independent chunks that are processed in a highly parallel fashion.",

author = "Silva, {Yasin N.} and Reed, {Jason M.} and Tsosie, {Lisa M.} and Matti, {Timothy A.}",

note = "Publisher Copyright: {\textcopyright} 2014 by Taylor & Francis Group, LLC.",

year = "2014",

month = jan,

day = "1",

doi = "10.1201/b16871",

language = "English (US)",

isbn = "9781466596931",

pages = "20--49",

booktitle = "Geographical Information Systems",

publisher = "CRC Press",

}

TY - CHAP

T1 - Similarity join for big geographic data

AU - Silva, Yasin N.

AU - Reed, Jason M.

AU - Tsosie, Lisa M.

AU - Matti, Timothy A.

PY - 2014/1/1

Y1 - 2014/1/1

N2 - Similarity Join is one of the most useful data processing and analysis operations for geographic data. It retrieves all data pairs whose distances are smaller than a predefi ned threshold e. Multiple application scenarios need to perform this operation over large amounts of data. Internet companies, for instance, collect massive amounts of information on their customers such as their geographic location and interests. They can use similarity queries to provide enhanced services to their customers; for example, a movie theatre website could recommend neighboring theatres and restaurants in the customer’s town. MapReduce, a framework for processing very large datasets using large computer clusters, constitutes an answer to the requirements of processing massive amounts of data in a highly scalable and distributed fashion (Dean and Ghemawat 2004). MapReduce-based systems are composed of large clusters of commodity machines and are often dynamically scalable, i.e., cluster nodes can be added or removed based on the workload. The MapReduce framework quickly processes massive datasets by splitting them into independent chunks that are processed in a highly parallel fashion.

AB - Similarity Join is one of the most useful data processing and analysis operations for geographic data. It retrieves all data pairs whose distances are smaller than a predefi ned threshold e. Multiple application scenarios need to perform this operation over large amounts of data. Internet companies, for instance, collect massive amounts of information on their customers such as their geographic location and interests. They can use similarity queries to provide enhanced services to their customers; for example, a movie theatre website could recommend neighboring theatres and restaurants in the customer’s town. MapReduce, a framework for processing very large datasets using large computer clusters, constitutes an answer to the requirements of processing massive amounts of data in a highly scalable and distributed fashion (Dean and Ghemawat 2004). MapReduce-based systems are composed of large clusters of commodity machines and are often dynamically scalable, i.e., cluster nodes can be added or removed based on the workload. The MapReduce framework quickly processes massive datasets by splitting them into independent chunks that are processed in a highly parallel fashion.

UR - http://www.scopus.com/inward/record.url?scp=85054744067&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85054744067&partnerID=8YFLogxK

U2 - 10.1201/b16871

DO - 10.1201/b16871

M3 - Chapter

AN - SCOPUS:85054744067

SN - 9781466596931

SP - 20

EP - 49

BT - Geographical Information Systems

PB - CRC Press

ER -

Similarity join for big geographic data

Abstract

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this