Spatial Lucene 2.0

by Chris MaleDecember 31, 2009

In a number of blog entries we have spoken about the spatial search functionality that we have been developing here at Jteam. In the last two weeks, I have had a chance to contribute much of this work back to the Apache Lucene project with the goal of furthering the development of Lucene’s open source spatial search support. If you want to dive immediately into the code, then jump to LUCENE-2139, if you want more details, then read on.

LUCENE-2139

The work we have contributed in LUCENE-2139 is merely building upon the work already done in Lucene’s spatial contrib. We have not replaced any of the existing algorithms, but rather we have focused on improving their architecture, performance and reliability.

Some of the improvements you can find in LUCENE-2139 are:

  • Multi-threaded distance filtering, considerably improving the overall performance of the filtering process
  • Support for multiple data formats whether it be latitude/longitudes, geohashes, or a user defined data format
  • Support for multiple distance calculators, giving the user the ability to decide between accuracy and efficiency
  • Increased documentation, test coverage and a code cleanup, improving the reliability, maintainability and readability of the contrib

Note, to allow greater review of these changes, I have broken our work up into a number of smaller issues, all of which you can find connected to LUCENE-2139.

The Future

JTeam is committed to helping to develop Lucene’s enterprise ready open source spatial search support. The improvements we have contributed are just the first step on our roadmap and are focused on building a foundation that we can then work from. In the future we intend to help develop the following features:

  • Simple bounding-box support for user’s who do not need the accuracy of cartesian tier filtering
  • Support for NumericRangeQuerys, an alternative way to efficiently find documents within a certain area
  • More efficient distance calculators, further reducing the overall search time
  • Polygon support
  • 3D search

Of course, if you have any ideas about what other improvements can be made to the contrib, feel free to contact me.