Apache Solr – Grouping update

by Martijn van GroningenOctober 3, 2011

Apache Solr’s result grouping feature is now a widely used feature. The major drawback was that grouping (also known as sharding in Solr) was initially not supported for distributed searches. The good news is that recently distributed grouping has been added to Solr! It has been added as the trunk and the stable branch (branch3x). This means that distributed grouping will be included in the upcoming Solr 3.5 and Solr 4.0 release.

In order to use distributed grouping you just need to be familiar with Solr’s distributed search and result grouping. There is no need for specific configuration or request parameter options. There are two request parameter options that behave differently in distributed mode. The first option group.ngroups options is a features that returns the number of groups. The second option group.truncate let features like faceting base their results on the grouped result instead of ungrouped results. These features do not give the same results compared to if these features were used in a non sharded environment. For example the total number of groups counted are most likely be higher compared to if you run it in a new sharded environment. How large the difference depends on how the documents are divided between the shards. If you partition the documents in such a way that documents belonging to a group are in one shard, then the group count will be accurate and correct. If you can’t partition the documents then you can still use this feature to compute an upper bound group count.
Currently only grouping by field and query works for distributed grouping. The support for distributed grouping by function will be added soon. So stay tuned!