Similarity join for big geographic data

Yasin N. Silva, Jason M. Reed, Lisa M. Tsosie, Timothy A. Matti

Research output: Chapter in Book/Report/Conference proceedingChapter


Similarity Join is one of the most useful data processing and analysis operations for geographic data. It retrieves all data pairs whose distances are smaller than a predefi ned threshold e. Multiple application scenarios need to perform this operation over large amounts of data. Internet companies, for instance, collect massive amounts of information on their customers such as their geographic location and interests. They can use similarity queries to provide enhanced services to their customers; for example, a movie theatre website could recommend neighboring theatres and restaurants in the customer’s town. MapReduce, a framework for processing very large datasets using large computer clusters, constitutes an answer to the requirements of processing massive amounts of data in a highly scalable and distributed fashion (Dean and Ghemawat 2004). MapReduce-based systems are composed of large clusters of commodity machines and are often dynamically scalable, i.e., cluster nodes can be added or removed based on the workload. The MapReduce framework quickly processes massive datasets by splitting them into independent chunks that are processed in a highly parallel fashion.

Original languageEnglish (US)
Title of host publicationGeographical Information Systems
Subtitle of host publicationTrends and Technologies
PublisherCRC Press
Number of pages30
ISBN (Electronic)9781466596955
ISBN (Print)9781466596931
StatePublished - Jan 1 2014

ASJC Scopus subject areas

  • General Computer Science
  • General Earth and Planetary Sciences
  • General Engineering


Dive into the research topics of 'Similarity join for big geographic data'. Together they form a unique fingerprint.

Cite this