{"id":6457,"date":"2012-04-10T11:10:49","date_gmt":"2012-04-10T09:10:49","guid":{"rendered":"http:\/\/blog.trifork.nl\/?p=6457"},"modified":"2012-04-10T11:10:49","modified_gmt":"2012-04-10T09:10:49","slug":"faceting-result-grouping","status":"publish","type":"post","link":"https:\/\/trifork.nl\/blog\/faceting-result-grouping\/","title":{"rendered":"Faceting &amp; result grouping"},"content":{"rendered":"<p>Result grouping and faceting are in essence two different search features. Faceting counts the number of hits for specific field values matching the current query. Result grouping groups documents together with a common property and places these documents under a group. These groups are used as the hits in the search result. Usually result grouping and faceting are used together and a lot of times the results get misunderstood.<\/p>\n<p>The main reason is that when using grouping people expect that a hit is represented by a group. Faceting isn\u2019t aware of groups and thus the computed counts represent documents and not groups. This different behaviour can be very confusion. A lot of questions on the Solr user mailing list are about this exact confusion.<\/p>\n<p>In the case that result grouping is used with faceting users expect grouped facet counts. What does this mean? This means that when counting the number of matches for a specific field value the grouped faceting should check whether the group a document belongs to isn\u2019t already counted before. This is best illustrated with some example documents.<\/p>\n<table border=\"1\" align=\"center\">\n<colgroup>\n<col width=\"69\" \/>\n<col width=\"93\" \/>\n<col width=\"152\" \/>\n<col width=\"142\" \/>\n<col width=\"168\" \/> <\/colgroup>\n<tbody>\n<tr>\n<th>item_id<\/th>\n<th>product_id<\/th>\n<th>product_name<\/th>\n<th>product_color<\/th>\n<th>product_size<\/th>\n<\/tr>\n<tr>\n<td>1<\/td>\n<td>1<\/td>\n<td>The blue jacket<\/td>\n<td>DarkBlue<\/td>\n<td>S<\/td>\n<\/tr>\n<tr>\n<td>2<\/td>\n<td>1<\/td>\n<td>The blue jacket<\/td>\n<td>DarkBlue<\/td>\n<td>M<\/td>\n<\/tr>\n<tr>\n<td>3<\/td>\n<td>1<\/td>\n<td>The blue jacket<\/td>\n<td>DarkBlue<\/td>\n<td>L<\/td>\n<\/tr>\n<tr>\n<td>4<\/td>\n<td>2<\/td>\n<td>The blue blouse<\/td>\n<td>RegularBlue<\/td>\n<td>S<\/td>\n<\/tr>\n<tr>\n<td>5<\/td>\n<td>2<\/td>\n<td>The blue blouse<\/td>\n<td>RegularBlue<\/td>\n<td>M<\/td>\n<\/tr>\n<tr>\n<td>6<\/td>\n<td>2<\/td>\n<td>The blue blouse<\/td>\n<td>DarkBlue<\/td>\n<td>L<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p>Lets say we query for all, facet by color field and group by product_id. Use faceting as it is we would have the following facet counts:<\/p>\n<ul>\n<li>DarkBlue &#8211; 4<\/li>\n<li>RegularBlue &#8211; 2<\/li>\n<\/ul>\n<p>When we would use grouped faceting we would have the following counts:<\/p>\n<ul>\n<li>DarkBlue &#8211; 2<\/li>\n<li>RegularBlue &#8211; 1<\/li>\n<\/ul>\n<p>The facet counts computed by the grouped faceting is actually what most end users expect. The good news is that support for grouped faceting was recently added to Solr and Lucene and will be included in their 4.0 release. Unfortunately grouped facets are more expensive to compute than normal facets due to the fact that it needs to keep track of which groups have already been counted for a specific facet value.<\/p>\n<h2>Grouped facets in Solr<\/h2>\n<p>In Solr grouped faceting builds further on the existing faceting parameters and can just be enabled by using the following parameter as is described on the <a href=\"http:\/\/wiki.apache.org\/solr\/FieldCollapsing#Request_Parameters\">Solr wiki<\/a>:<br \/>\n<code>group.facet=true<\/code><br \/>\nWhen enabled all the already specified field facets (<code>facet.field<\/code> parameters) will be computed as grouped facets. Both single and multivalued field facets are supported. Other facet types like range facets aren\u2019t supported yet.<\/p>\n<h2>Grouped facets in Lucene<\/h2>\n<p>Grouped facets are implemented as Lucene collector in the Lucene grouping module. The following code example shows how grouped facets can be used:<\/p>\n<pre class=\"brush: java; title: ; notranslate\" title=\"\"> boolean facetFieldMultivalued = false;\nBytesRef facetPrefix = null\nAbstractGroupFacetCollector groupedAirportFacetCollector = TermGroupFacetCollector.createTermGroupFacetCollector(groupField, facetField, facetFieldMultivalued, facetPrefix, 128);\nsearcher.search(query, groupedAirportFacetCollector); \/\/ Computing the grouped facet counts\nboolean orderFacetEntriesByCount = true;\nTermGroupFacetCollector.GroupedFacetResult airportResult = groupedAirportFacetCollector.mergeSegmentResults(offset + limit, minCount, orderFacetEntriesByCount);\nSystem.out.printf(&quot;Total facet hit count&quot; + airportResult.getTotalCount());\nSystem.out.printf(&quot;Total facet hit missing count&quot; + airportResult.getTotalMissingCount());\nList&amp;lt;AbstractGroupFacetCollector.FacetEntry&amp;gt; facetEntries = airportResult.getFacetEntries(offset, limit);\nfor (AbstractGroupFacetCollector.FacetEntry facetEntry : facetEntries) {\n  \/\/ render facet entries\n}\n<\/pre>\n<p>As you can see in the above code sample there are a number of options that can be specified:<\/p>\n<ul>\n<li><strong>groupField<\/strong> &#8211; The field to group by.<\/li>\n<li><strong>facetField<\/strong> &#8211; The field to count grouped facets for.<\/li>\n<li><strong>facetFieldMultivalued<\/strong> &#8211; Whether the facetField has multiple values per document. Computing facet counts for fields with maximum one value per document is faster than computing for fields having more than one value per document.<\/li>\n<li><strong>facetPrefix<\/strong> &#8211; Count only values that start with the prefix. If the prefix is null all values are counted that match the query.<\/li>\n<li><strong>offset<\/strong> &#8211; The offset to start to include facet entries.<\/li>\n<li><strong>limit<\/strong> &#8211; The number of facet entries to include from the offset.<\/li>\n<li><strong>minCount<\/strong> &#8211; The minimum count a facet entry needs to have to be included in the facet entries.<\/li>\n<li><strong>orderFacetEntriesByCount<\/strong> &#8211; Whether to order the facet entries by count.<\/li>\n<\/ul>\n<p>Not all options are required to to be used. There is also a doc values based implementation for grouped facets that is included in the grouping module. This implementation isn\u2019t used by Solr.<\/p>\n<p>As you can see it is quite easy to use grouped faceting from both Solr and Lucene. Did you try out this new feature? If so let us know how the grouped faceting is working in your Lucene app or Solr setup by posting comment!<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Result grouping and faceting are in essence two different search features. Faceting counts the number of hits for specific field values matching the current query. Result grouping groups documents together with a common property and places these documents under a group. These groups are used as the hits in the search result. Usually result grouping [&hellip;]<\/p>\n","protected":false},"author":77,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"content-type":"","footnotes":""},"categories":[15,65,10],"tags":[35,33,296,295,269],"class_list":["post-6457","post","type-post","status-publish","format-standard","hentry","category-enterprise-search","category-big_data_search","category-development","tag-lucene","tag-solr","tag-faceting","tag-grouping","tag-result-grouping"],"acf":[],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v24.4 - https:\/\/yoast.com\/wordpress\/plugins\/seo\/ -->\n<title>Faceting &amp; result grouping - Trifork Blog<\/title>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/trifork.nl\/blog\/faceting-result-grouping\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Faceting &amp; result grouping - Trifork Blog\" \/>\n<meta property=\"og:description\" content=\"Result grouping and faceting are in essence two different search features. Faceting counts the number of hits for specific field values matching the current query. Result grouping groups documents together with a common property and places these documents under a group. These groups are used as the hits in the search result. Usually result grouping [&hellip;]\" \/>\n<meta property=\"og:url\" content=\"https:\/\/trifork.nl\/blog\/faceting-result-grouping\/\" \/>\n<meta property=\"og:site_name\" content=\"Trifork Blog\" \/>\n<meta property=\"article:published_time\" content=\"2012-04-10T09:10:49+00:00\" \/>\n<meta name=\"author\" content=\"Martijn van Groningen\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"Martijn van Groningen\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"4 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"WebPage\",\"@id\":\"https:\/\/trifork.nl\/blog\/faceting-result-grouping\/\",\"url\":\"https:\/\/trifork.nl\/blog\/faceting-result-grouping\/\",\"name\":\"Faceting &amp; result grouping - Trifork Blog\",\"isPartOf\":{\"@id\":\"https:\/\/trifork.nl\/blog\/#website\"},\"datePublished\":\"2012-04-10T09:10:49+00:00\",\"author\":{\"@id\":\"https:\/\/trifork.nl\/blog\/#\/schema\/person\/72d3e6a70910facfdef86dd93ced0e57\"},\"breadcrumb\":{\"@id\":\"https:\/\/trifork.nl\/blog\/faceting-result-grouping\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/trifork.nl\/blog\/faceting-result-grouping\/\"]}]},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/trifork.nl\/blog\/faceting-result-grouping\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\/\/trifork.nl\/blog\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"Faceting &amp; result grouping\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/trifork.nl\/blog\/#website\",\"url\":\"https:\/\/trifork.nl\/blog\/\",\"name\":\"Trifork Blog\",\"description\":\"Keep updated on the technical solutions Trifork is working on!\",\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/trifork.nl\/blog\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"en-US\"},{\"@type\":\"Person\",\"@id\":\"https:\/\/trifork.nl\/blog\/#\/schema\/person\/72d3e6a70910facfdef86dd93ced0e57\",\"name\":\"Martijn van Groningen\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/trifork.nl\/blog\/#\/schema\/person\/image\/\",\"url\":\"https:\/\/secure.gravatar.com\/avatar\/505caa844fb66f275a027798c993c363?s=96&d=mm&r=g\",\"contentUrl\":\"https:\/\/secure.gravatar.com\/avatar\/505caa844fb66f275a027798c993c363?s=96&d=mm&r=g\",\"caption\":\"Martijn van Groningen\"},\"url\":\"https:\/\/trifork.nl\/blog\/author\/martijn\/\"}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"Faceting &amp; result grouping - Trifork Blog","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/trifork.nl\/blog\/faceting-result-grouping\/","og_locale":"en_US","og_type":"article","og_title":"Faceting &amp; result grouping - Trifork Blog","og_description":"Result grouping and faceting are in essence two different search features. Faceting counts the number of hits for specific field values matching the current query. Result grouping groups documents together with a common property and places these documents under a group. These groups are used as the hits in the search result. Usually result grouping [&hellip;]","og_url":"https:\/\/trifork.nl\/blog\/faceting-result-grouping\/","og_site_name":"Trifork Blog","article_published_time":"2012-04-10T09:10:49+00:00","author":"Martijn van Groningen","twitter_card":"summary_large_image","twitter_misc":{"Written by":"Martijn van Groningen","Est. reading time":"4 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"WebPage","@id":"https:\/\/trifork.nl\/blog\/faceting-result-grouping\/","url":"https:\/\/trifork.nl\/blog\/faceting-result-grouping\/","name":"Faceting &amp; result grouping - Trifork Blog","isPartOf":{"@id":"https:\/\/trifork.nl\/blog\/#website"},"datePublished":"2012-04-10T09:10:49+00:00","author":{"@id":"https:\/\/trifork.nl\/blog\/#\/schema\/person\/72d3e6a70910facfdef86dd93ced0e57"},"breadcrumb":{"@id":"https:\/\/trifork.nl\/blog\/faceting-result-grouping\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/trifork.nl\/blog\/faceting-result-grouping\/"]}]},{"@type":"BreadcrumbList","@id":"https:\/\/trifork.nl\/blog\/faceting-result-grouping\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/trifork.nl\/blog\/"},{"@type":"ListItem","position":2,"name":"Faceting &amp; result grouping"}]},{"@type":"WebSite","@id":"https:\/\/trifork.nl\/blog\/#website","url":"https:\/\/trifork.nl\/blog\/","name":"Trifork Blog","description":"Keep updated on the technical solutions Trifork is working on!","potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/trifork.nl\/blog\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":"Person","@id":"https:\/\/trifork.nl\/blog\/#\/schema\/person\/72d3e6a70910facfdef86dd93ced0e57","name":"Martijn van Groningen","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/trifork.nl\/blog\/#\/schema\/person\/image\/","url":"https:\/\/secure.gravatar.com\/avatar\/505caa844fb66f275a027798c993c363?s=96&d=mm&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/505caa844fb66f275a027798c993c363?s=96&d=mm&r=g","caption":"Martijn van Groningen"},"url":"https:\/\/trifork.nl\/blog\/author\/martijn\/"}]}},"_links":{"self":[{"href":"https:\/\/trifork.nl\/blog\/wp-json\/wp\/v2\/posts\/6457","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/trifork.nl\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/trifork.nl\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/trifork.nl\/blog\/wp-json\/wp\/v2\/users\/77"}],"replies":[{"embeddable":true,"href":"https:\/\/trifork.nl\/blog\/wp-json\/wp\/v2\/comments?post=6457"}],"version-history":[{"count":0,"href":"https:\/\/trifork.nl\/blog\/wp-json\/wp\/v2\/posts\/6457\/revisions"}],"wp:attachment":[{"href":"https:\/\/trifork.nl\/blog\/wp-json\/wp\/v2\/media?parent=6457"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/trifork.nl\/blog\/wp-json\/wp\/v2\/categories?post=6457"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/trifork.nl\/blog\/wp-json\/wp\/v2\/tags?post=6457"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}