In many domains, notably intelligence services, the value of information decreases over time. Consequently, time-to-insight is a key metric. Features inherent in the analysis of geospatial data, which Dstl often operates with, cause large scale data processing to be challenging and, often, computationally expensive.
Efforts have been made to reduce the complexity of geospatial data with standardized specifications (e.g. GeoJson data format), and a variety of promising technologies which eliminate superfluous details for the end user. However, there is insufficient comparative data available to understand the relative performance of many of these technologies. In particular, relative query and ingestion times are not well understood.
Reflecting their desire to make evidence-based decisions in this area, Dstl engaged Data Reply to benchmark six prominent Big Data technologies with geospatial processing capabilities, to assist them in selecting the appropriate technology given the workload, along with advice on tuning for performance.