Hotspotting Lucene With Query Specialization

by Chris MaleSeptember 12, 2011

Most Java developers probably take the technology behind HotspotTM  for granted and assume it’s best suited to optimize their code.  While they might know that choosing the best implementation of List will have a  big impact on their program’s performance, they probably shy away from worrying about the cost of virtual method calls, believing Hotspot knows best.

With most applications and in most situations this belief is well-founded and what I’m going to discuss is not trying to change the world view on Hotspot or encourage people to ‘optimize prematurely’. Rather, I want to explore the how we might aid Hotspot to produce the most optimal code for executing Lucene Querys and in the process increase query performance by over 100%.

Red Hot FunctionQuery Execution

One of the most popular Querys in Apache Solr and Lucene is FunctionQuery, which applies a function (known as a ValueSource) to every document in the index, returning the result of the function as the document’s score.  Rarely in Lucene’s Querys is logic applied to every document since Querys are responsible for matching (finding a relevant subset of documents) as well as scoring. Consequently, scoring logic is usually only applied to the matched subset. FunctionQuerys are not focused on matching, instead they manipulate the scores of documents in extensible ways.

To understand the problems this might cause, imagine you have an index with 100 million documents containing two fields which you wish to sum together and use as document scores. Even though the logic required to implement this is very simple, the fact that it will be executed 100 million times for every query means that it will glow hot – red hot.

As mentioned, FunctionQuery is designed to be extensible. It accepts implementations of ValueSource of which there are many in Lucene’s codebase. One such abstract implementation is MultiFloatFunction, itself accepting a list of ValueSources. MultiFloatFunction computes its function result by applying the logic implemented in the method func() to each of the results taken from its ValueSources. SumFloatFunction, a concrete implementation of MultiFloatFunction, adds sum logic to func().

The following UML class diagram illustrates the hierarchy of ValueSource classes necessary to add the values of two integer fields together.

Note, FieldValueSourceis a ValueSource implementation that returns the value of a field for a document. IntFieldValueSource assumes the value is an integer.

Although this seems convoluted already, in practice this example is not too bad. The code boils down to two values being read from two different arrays and added together. However, if you wanted to add two field values together and then multiply the result by a third field value, you would need to add another MultiFloatFunction implementation (ProductFloatFunction) and another IntFieldSource.  Suddenly we have a large stack of objects with many layers of virtual calls that must be executed per document per Query just to execute the formula (A + B) * C.

Specialized FunctionQuerys

In a simple benchmark, I indexed 100,000 documents containing random integer values for 3 fields.  I then assembled the ValueSource required to execute A + B and computed the time required to execute 100,000 FunctionQuerys using the ValueSource. The result, on my underpowered laptop, was an average execution time of 3ms – not bad, not bad at all. However when I repeated the same test, this time using the ValueSource for (A + B) * C, the execution time jumped up to 25ms per query – still not bad, especially given the benchmarks were not on a dedicated server.

However, I wondered if part of the increase in cost were the extra levels of virtual method calls that needed to be made. To test this, I created a simple interface called Function (shown below) and created an implementation of FunctionQuery which used Functions instead of
ValueSources.

public interface Function {
float f(int doc);
void setNextReader(IndexReader indexReader) throws IOException;
}

I then created an implement to support the formula (A + B) * C as follows:

public final class ABCFunction {
private final int[] aVals, bVals, cVals;
public float f(int doc) { return (aVals[doc] + bVals[doc]) * cVals[doc]; }
public void setNextReader(IndexReader indexReader) throws IOException {
// load arrays from FieldCache
}
}

Executed against the same index, the time per FunctionQuery dropped to just 13ms – a huge improvement.

Whats the difference? Well the above implementation does not use object composition. The only virtual method call executed is that from FunctionQuery to the Function implementation. Even though the logic is the same as that executed using the ValueSources (3 values loaded from arrays, arithmetic applied), Hotspot is able to execute the simplified code with fewer method invocations much quicker.

Extensibility and Performance

While hopefully you are impressed by the performance improvement, you’re also probably thinking “Well that’s great Chris, but this is not extensible or reusable” and you would be right.  ValueSource’s composition architecture allows us to quickly assemble a wide variety of functions while hiding away much of the complexity (some ValueSources are very complex).  They also lend themselves to be easy created while parsing a query string.

Yet it could be argued that the above code illustrates that if your search application uses a small set of known functions regularly, there is a considerable performance benefit to be gained by providing specialised implementations.  From my experience, most Solr applications use only a couple of functions, primarily composed of the rudimentary arithmetic variety.

ASTs and Executables

While toying with ValueSource and comparing it against my own ABCFunction, I couldn’t help but notice the similarity between ValueSources and ASTs, and ABCFunction and Java Bytecode. When compiling source code, most compilers create an Abstract Syntax Tree (AST) representation of the code.  ASTs are a very close representation of the original source code, but use typed objects and composition, instead of text, to represent code statements. The following is a UML diagram showing the AST javac builds to represent the statement int a = 10:

Having constructed an AST and after applying some optimizations, most compilers then generate the executable form of the source code. For Javac, this is Java Bytecode.

An assembled ValueSource is very much like an AST. It describes the function that is to be executed and as mentioned, can be very easily constructed during query string parsing. But like an AST, it perhaps isn’t the most efficient form for execution.  Instead, it needs to optimized (for a future article), and then converted to an optimal execution format. ABCFunction can be seen as an example of such a format.

So how do we get from ValueSource to our optimal Function implementation?
Well, we could generate source code and use Javac to compile it for us, or we could generate bytecode ourselves, or we could even generate native code. However we do it, we are creating the foundations for a Query Compiler.

Query Compilation and Beyond

Hopefully I’ve tickled your interest in how we might go about using code specialisation to increase the performance of Hotspot’s executions of Query code. In a future article, I will extend my explorations to the execution of all Lucene’s Querys and explain in more detail how we might go about building a Query Compiler for Lucene.