Efficient processing of Distance Range Queries (DRQs) is of great importance in spatial databases due to the wide area of applications. This type of spatial query is characterized by a distance range over one or two datasets. The most representative and known DRQs are the eDistance Range Query (eDRQ) and the eDistance Range Join Query (eDRJQ). Given the increasing volume of spatial data, it is difficult to perform a DRQ on a centralized machine efficiently. Moreover, the eDRJQ is an expensive spatial operation, since it can be considered a combination of the eDR and the spatial join queries. For this reason, this paper addresses the problem of computing DRQs on big spatial datasets in SpatialHadoop, an extension of Hadoop that supports spatial operations efficiently, and proposes new algorithms in SpatialHadoop to perform efficient parallel DRQs on large-scale spatial datasets. We have evaluated the performance of the proposed algorithms in several situations with big synthetic and real-world datasets. The experiments have demonstrated the efficiency (in terms of total execution time and number of distance computations) and scalability (in terms of epsilon values, sizes of datasets and number of computing nodes) of our proposal.
Distance Range Queries, Distance Join, Spatial Data, Processing SpatialHadoop, MapReduce