Lucene, and the NoSQL stores that leverage it, support storage and searching of polygonal records. However the spatial index implementation traditionally has returned false matches to spatial queries.
We have contributed a new spatial indexing strategy to Lucene Spatial that returns fully accurate results (i.e. exact matches only).
Better still, this new spatial search strategy often enables keeping a smaller index and and faster retrieval of results.
I will illustrate why false matches happen -- this requires a high-level walkthrough of spatial index trees -- and real world cases where it makes a difference.
Our initial workaround was to query Elasticsearch through a separate server layer that post-filters Elasticsearch results against the query shape, removing the false matches.
We've now built a similar approach into Lucene Spatial itself. By virtue of living inside, this new solution can take advantage of numerous efficiencies:
1. it filters away false matches before fetching their document contents;
2. it uses a binary serialization that is far faster than the GeoJSON we used before;
3. it optimizes the tradeoff between work done in the index tree vs. post-filtering, often resulting in a smaller index and faster querying. I will provide benchmark numbers.
I'll illustrate how developers and database administrators can use this improvement in their own databases (it's easy!).