Enabling Efficient Distributed Spatial Join on Large Scale Vector-Raster Data Lakes





Publicado en

Actas de las XXVII Jornadas de Ingeniería del Software y Bases de Datos (JISBD 2023)

Licencia Creative Commons


Both the increasing number of GPS-enabled mobile devices and the geographic crowd-sourcing initiatives, such as Open Street Map, are determinants for the large amount of vector spatial data that is currently being produced. On the other hand, the automatic generation of raster data by remote sensing devices and environmental modeling processes was always leading to very large datasets. Currently, huge data generation rates are reached by improved sensor observation systems and data processing infrastructures. As an example, the Sentinel Data Access System of the Copernicus Program of the European Space Agency (ESA) was publishing 38.71 TB of data per day during 2020. This paper shows how the assumption of a new spatial data model that includes multi-resolution parametric spatial data types, enables achieving an efficient implementation of a large scale distributed spatial analysis system for integrated vector-raster data lakes. In particular, the proposed implementation outperforms the state-of-the-art Spark-based spatial analysis systems by more than one order of magnitude during vector raster spatial join evaluation.


Acerca de Villarroya, Sebastián

Palabras clave

Large-scale Data Analysis, Spatial Analytics, Spatial Data Management, Vector-raster Data Analysis
Página completa del ítem
Notificar un error en este resumen
Mostrar cita
Mostrar cita en BibTeX
Descargar cita en BibTeX