{"id":3424,"date":"2011-05-19T15:49:52","date_gmt":"2011-05-19T13:49:52","guid":{"rendered":"http:\/\/blog.jteam.nl\/?p=3424"},"modified":"2011-05-19T15:49:52","modified_gmt":"2011-05-19T13:49:52","slug":"search-result-grouping-field-collapsing-in-lucene-solr","status":"publish","type":"post","link":"https:\/\/trifork.nl\/blog\/search-result-grouping-field-collapsing-in-lucene-solr\/","title":{"rendered":"Search Result Grouping \/ Field Collapsing in Lucene \/ Solr"},"content":{"rendered":"<p>Grouping of search results or also known as field collapsing is often a requirement for search projects. As <a href=\"http:\/\/blog.jteam.nl\/2009\/10\/20\/result-grouping-field-collapsing-with-solr\/\">described earlier<\/a> this functionality was added to Solr and happens to be one of the most wanted features in Solr. Recently result grouping was added to Lucene as contrib in Lucene 3.1 and a module in 4.0. Adding the functionality to Lucene makes the feature much more flexible to use. Effort is currently put in to add the result grouping contrib in the 3.x branch to Solr. See <a href=\"https:\/\/issues.apache.org\/jira\/browse\/SOLR-2524\">SOLR-2524<\/a> for more information. This means that grouping will most likely be available in Solr 3.2!<\/p>\n<p><!--more--><\/p>\n<h2>History<\/h2>\n<p>It all began about 4 years ago when the <a href=\"https:\/\/issues.apache.org\/jira\/browse\/SOLR-236\">SOLR-236<\/a> issue was created. Back then result grouping was known as field collapsing and the functionality was more focused on collapsing documents in the result set that have the same field value. The patch that was attached to this issue expanded over time and more people started to using it. Features were added and improvements were made by many people. The field collapse feature stayed as a patch in the Jira for more than 3 years. The only option for Solr users that wanted to use it was patch Solr and run on that built version. This is obviously error prone and many questions regarding this subject were sent to the Solr mailing lists. Besides that, there were many other Jira issues and patches related to field collapsing, which confused people even more!<\/p>\n<p>Last september result grouping became available in the trunk (4.0-dev). The field collapse functionality was rewritten to a grouping functionality (<a href=\"https:\/\/issues.apache.org\/jira\/browse\/SOLR-1682\">SOLR-1682<\/a>) and the performance was improved dramatically. Also, result grouping by function was added, so the feature slightly changed.<\/p>\n<p>More recently, effort was put into <a href=\"https:\/\/issues.apache.org\/jira\/browse\/LUCENE-1421\">LUCENE-1421<\/a>. This Jira issue was created with the intent to expose result grouping to Lucene. The grouping feature in the Solr trunk was rewritten and put into a grouping module in Lucene. It has also been backported to 3.x branch as Lucene contrib. Currently the only features it doesn&#8217;t support are grouping by function and by query. <a href=\"https:\/\/issues.apache.org\/jira\/browse\/LUCENE-3099\">LUCENE-3099<\/a> has been created to add these capabilities to Lucene soon.<\/p>\n<h2>Result Grouping in Lucene<\/h2>\n<p>Grouping in Lucene is implemented as collectors and are really easy to use as is shown in the following code samples. There is a <a href=\"http:\/\/svn.apache.org\/viewvc\/lucene\/dev\/trunk\/modules\/grouping\/src\/java\/org\/apache\/lucene\/search\/grouping\/FirstPassGroupingCollector.java?view=markup\">FirstPassGroupingCollector<\/a> to collect the top N most relevant documents per group. The <a href=\"http:\/\/svn.apache.org\/viewvc\/lucene\/dev\/trunk\/modules\/grouping\/src\/java\/org\/apache\/lucene\/search\/grouping\/SecondPassGroupingCollector.java?view=markup\">SecondPassGroupingCollector<\/a> gathers documents within the top N groups.<\/p>\n<pre class=\"brush: java; title: ; notranslate\" title=\"\">\nFirstPassGroupingCollector c1 = new FirstPassGroupingCollector(&quot;author&quot;, groupSort, groupOffset + topNGroups);\nindexSearcher.search(new TermQuery(new Term(&quot;content&quot;, searchTerm)), c1);\n\nCollection&lt;SearchGroup&gt; topGroups = c1.getTopGroups(groupOffset, fillFields);\n\nif (topGroups == null) {\n   \/\/ No groups matched\n  return;\n}\n\nboolean getScores = true;\nboolean getMaxScores = true;\nboolean fillFields = true;\nSecondPassGroupingCollector c2 = new SecondPassGroupingCollector(&quot;author&quot;, topGroups, groupSort, docSort, docOffset + docsPerGroup, getScores, getMaxScores, fillFields);\n    indexSearcher.search(new TermQuery(new Term(&quot;content&quot;, searchTerm)), c2);\n\nTopGroups groupsResult = c2.getTopGroups(docOffset);\n<\/pre>\n<p>If the searches are expensive you might want to consider using the <a href=\"http:\/\/svn.apache.org\/viewvc\/lucene\/dev\/trunk\/lucene\/src\/java\/org\/apache\/lucene\/search\/CachingCollector.java?view=markup\">CachingCollector<\/a>. This collector can cache the document ids and score from the first pass search and replay it during the second pass search. See the <a href=\"http:\/\/svn.apache.org\/viewvc\/lucene\/dev\/trunk\/modules\/grouping\/src\/java\/org\/apache\/lucene\/search\/grouping\/package.html?view=markup\">grouping documentation<\/a> for its usage.<\/p>\n<p>There is also another collector named the <a href=\"http:\/\/svn.apache.org\/viewvc\/lucene\/dev\/trunk\/modules\/grouping\/src\/java\/org\/apache\/lucene\/search\/grouping\/AllGroupsCollector.java?view=markup\">AllGroupsCollector<\/a> that is concerned with collecting all groups that match a query. This can for example be used to get the total count based on groups.<\/p>\n<pre class=\"brush: java; title: ; notranslate\" title=\"\">\n\/\/ First pass search has been executed\nboolean getScores = true;\nboolean getMaxScores = true;\nboolean fillFields = true;\nAllGroupsCollector c3 = new AllGroupsCollector(&quot;author&quot;);\nSecondPassGroupingCollector c2 = new SecondPassGroupingCollector(&quot;author&quot;, topGroups, groupSort, docSort, docOffset + docsPerGroup, getScores, getMaxScores, fillFields);\nindexSearcher.search(new TermQuery(new Term(&quot;content&quot;, searchTerm)), MultiCollector.wrap(c2, c3));\n\nTopGroups groupsResult = c2.getTopGroups(docOffset);\ngroupsResult = new TopGroups(groupsResult, c3.getGroupCount());\n<\/pre>\n<p>The <code>AllGroupsCollector<\/code> can be nicely wrapped with the the <code>SecondPassGroupingCollector<\/code> in the second pass search with the <code>MultiCollector<\/code>. The <code>AllGroupsCollector<\/code> can also be used independently from other collectors.<\/p>\n<h2>Result Grouping in Solr<\/h2>\n<p>Currently the grouping in the Solr trunk doesn&#8217;t use the Lucene grouping module. It uses its own grouping implementation. The reason why Solr is not using the grouping module yet, is that grouping by function and query needs to be supported first. However grouping hasn&#8217;t yet been implemented in Solr 3.1 The downside is that Solr users still need to patch and build their own version to be able to group results. Even worse, most users use one of the obsolete patches in <a href=\"https:\/\/issues.apache.org\/jira\/browse\/SOLR-236\">SOLR-236<\/a> that have been adapted to work with Solr 3.1. That is one of the reasons why I created <a href=\"https:\/\/issues.apache.org\/jira\/browse\/SOLR-2524\">SOLR-2524<\/a>.<\/p>\n<p>The SOLR-2524 issue is concerned with integrating the Lucene contrib module into the branch 3.x Solr. This issue also serves as reference to integrate the grouping module into the trunk version of Solr (4.0). The branch 3.x Solr grouping will be supporting the same response formats and request parameters as described on the <a href=\"http:\/\/wiki.apache.org\/solr\/FieldCollapsing\">Solr FieldCollapse wiki page<\/a>. The only parameters it doesn&#8217;t support (yet) are those regarding grouping by function and query.<br \/>\nIf all goes well this issue will be committed soon and included in the Solr 3.2 release. And thus giving Solr users the grouping feature out-of-the-box! <\/p>\n","protected":false},"excerpt":{"rendered":"<p>Grouping of search results or also known as field collapsing is often a requirement for search projects. As described earlier this functionality was added to Solr and happens to be one of the most wanted features in Solr. Recently result grouping was added to Lucene as contrib in Lucene 3.1 and a module in 4.0. [&hellip;]<\/p>\n","protected":false},"author":77,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"content-type":"","footnotes":""},"categories":[15],"tags":[35,33],"class_list":["post-3424","post","type-post","status-publish","format-standard","hentry","category-enterprise-search","tag-lucene","tag-solr"],"acf":[],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v24.4 - https:\/\/yoast.com\/wordpress\/plugins\/seo\/ -->\n<title>Search Result Grouping \/ Field Collapsing in Lucene \/ Solr - Trifork Blog<\/title>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/trifork.nl\/blog\/search-result-grouping-field-collapsing-in-lucene-solr\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Search Result Grouping \/ Field Collapsing in Lucene \/ Solr - Trifork Blog\" \/>\n<meta property=\"og:description\" content=\"Grouping of search results or also known as field collapsing is often a requirement for search projects. As described earlier this functionality was added to Solr and happens to be one of the most wanted features in Solr. Recently result grouping was added to Lucene as contrib in Lucene 3.1 and a module in 4.0. [&hellip;]\" \/>\n<meta property=\"og:url\" content=\"https:\/\/trifork.nl\/blog\/search-result-grouping-field-collapsing-in-lucene-solr\/\" \/>\n<meta property=\"og:site_name\" content=\"Trifork Blog\" \/>\n<meta property=\"article:published_time\" content=\"2011-05-19T13:49:52+00:00\" \/>\n<meta name=\"author\" content=\"Martijn van Groningen\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"Martijn van Groningen\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"4 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"WebPage\",\"@id\":\"https:\/\/trifork.nl\/blog\/search-result-grouping-field-collapsing-in-lucene-solr\/\",\"url\":\"https:\/\/trifork.nl\/blog\/search-result-grouping-field-collapsing-in-lucene-solr\/\",\"name\":\"Search Result Grouping \/ Field Collapsing in Lucene \/ Solr - Trifork Blog\",\"isPartOf\":{\"@id\":\"https:\/\/trifork.nl\/blog\/#website\"},\"datePublished\":\"2011-05-19T13:49:52+00:00\",\"author\":{\"@id\":\"https:\/\/trifork.nl\/blog\/#\/schema\/person\/72d3e6a70910facfdef86dd93ced0e57\"},\"breadcrumb\":{\"@id\":\"https:\/\/trifork.nl\/blog\/search-result-grouping-field-collapsing-in-lucene-solr\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/trifork.nl\/blog\/search-result-grouping-field-collapsing-in-lucene-solr\/\"]}]},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/trifork.nl\/blog\/search-result-grouping-field-collapsing-in-lucene-solr\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\/\/trifork.nl\/blog\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"Search Result Grouping \/ Field Collapsing in Lucene \/ Solr\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/trifork.nl\/blog\/#website\",\"url\":\"https:\/\/trifork.nl\/blog\/\",\"name\":\"Trifork Blog\",\"description\":\"Keep updated on the technical solutions Trifork is working on!\",\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/trifork.nl\/blog\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"en-US\"},{\"@type\":\"Person\",\"@id\":\"https:\/\/trifork.nl\/blog\/#\/schema\/person\/72d3e6a70910facfdef86dd93ced0e57\",\"name\":\"Martijn van Groningen\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/trifork.nl\/blog\/#\/schema\/person\/image\/\",\"url\":\"https:\/\/secure.gravatar.com\/avatar\/505caa844fb66f275a027798c993c363?s=96&d=mm&r=g\",\"contentUrl\":\"https:\/\/secure.gravatar.com\/avatar\/505caa844fb66f275a027798c993c363?s=96&d=mm&r=g\",\"caption\":\"Martijn van Groningen\"},\"url\":\"https:\/\/trifork.nl\/blog\/author\/martijn\/\"}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"Search Result Grouping \/ Field Collapsing in Lucene \/ Solr - Trifork Blog","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/trifork.nl\/blog\/search-result-grouping-field-collapsing-in-lucene-solr\/","og_locale":"en_US","og_type":"article","og_title":"Search Result Grouping \/ Field Collapsing in Lucene \/ Solr - Trifork Blog","og_description":"Grouping of search results or also known as field collapsing is often a requirement for search projects. As described earlier this functionality was added to Solr and happens to be one of the most wanted features in Solr. Recently result grouping was added to Lucene as contrib in Lucene 3.1 and a module in 4.0. [&hellip;]","og_url":"https:\/\/trifork.nl\/blog\/search-result-grouping-field-collapsing-in-lucene-solr\/","og_site_name":"Trifork Blog","article_published_time":"2011-05-19T13:49:52+00:00","author":"Martijn van Groningen","twitter_card":"summary_large_image","twitter_misc":{"Written by":"Martijn van Groningen","Est. reading time":"4 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"WebPage","@id":"https:\/\/trifork.nl\/blog\/search-result-grouping-field-collapsing-in-lucene-solr\/","url":"https:\/\/trifork.nl\/blog\/search-result-grouping-field-collapsing-in-lucene-solr\/","name":"Search Result Grouping \/ Field Collapsing in Lucene \/ Solr - Trifork Blog","isPartOf":{"@id":"https:\/\/trifork.nl\/blog\/#website"},"datePublished":"2011-05-19T13:49:52+00:00","author":{"@id":"https:\/\/trifork.nl\/blog\/#\/schema\/person\/72d3e6a70910facfdef86dd93ced0e57"},"breadcrumb":{"@id":"https:\/\/trifork.nl\/blog\/search-result-grouping-field-collapsing-in-lucene-solr\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/trifork.nl\/blog\/search-result-grouping-field-collapsing-in-lucene-solr\/"]}]},{"@type":"BreadcrumbList","@id":"https:\/\/trifork.nl\/blog\/search-result-grouping-field-collapsing-in-lucene-solr\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/trifork.nl\/blog\/"},{"@type":"ListItem","position":2,"name":"Search Result Grouping \/ Field Collapsing in Lucene \/ Solr"}]},{"@type":"WebSite","@id":"https:\/\/trifork.nl\/blog\/#website","url":"https:\/\/trifork.nl\/blog\/","name":"Trifork Blog","description":"Keep updated on the technical solutions Trifork is working on!","potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/trifork.nl\/blog\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":"Person","@id":"https:\/\/trifork.nl\/blog\/#\/schema\/person\/72d3e6a70910facfdef86dd93ced0e57","name":"Martijn van Groningen","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/trifork.nl\/blog\/#\/schema\/person\/image\/","url":"https:\/\/secure.gravatar.com\/avatar\/505caa844fb66f275a027798c993c363?s=96&d=mm&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/505caa844fb66f275a027798c993c363?s=96&d=mm&r=g","caption":"Martijn van Groningen"},"url":"https:\/\/trifork.nl\/blog\/author\/martijn\/"}]}},"_links":{"self":[{"href":"https:\/\/trifork.nl\/blog\/wp-json\/wp\/v2\/posts\/3424","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/trifork.nl\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/trifork.nl\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/trifork.nl\/blog\/wp-json\/wp\/v2\/users\/77"}],"replies":[{"embeddable":true,"href":"https:\/\/trifork.nl\/blog\/wp-json\/wp\/v2\/comments?post=3424"}],"version-history":[{"count":0,"href":"https:\/\/trifork.nl\/blog\/wp-json\/wp\/v2\/posts\/3424\/revisions"}],"wp:attachment":[{"href":"https:\/\/trifork.nl\/blog\/wp-json\/wp\/v2\/media?parent=3424"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/trifork.nl\/blog\/wp-json\/wp\/v2\/categories?post=3424"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/trifork.nl\/blog\/wp-json\/wp\/v2\/tags?post=3424"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}