{"id":6419,"date":"2011-12-14T14:35:19","date_gmt":"2011-12-14T13:35:19","guid":{"rendered":"http:\/\/blog.trifork.nl\/?p=6419"},"modified":"2011-12-14T14:35:19","modified_gmt":"2011-12-14T13:35:19","slug":"apache-lucene-solr-3-5-0","status":"publish","type":"post","link":"https:\/\/trifork.nl\/blog\/apache-lucene-solr-3-5-0\/","title":{"rendered":"Apache Lucene &amp; Solr 3.5.0"},"content":{"rendered":"<p>Just a little over two weeks ago <a href=\"http:\/\/lucene.apache.org\/java\/docs\/index.html\">Apache Lucene<\/a> and <a href=\"http:\/\/lucene.apache.org\/solr\/\">Solr<\/a> 3.5.0 were released. \u00a0The released artifacts can be found <a href=\"http:\/\/www.apache.org\/dyn\/closer.cgi\/lucene\/java\/\">here<\/a> and <a href=\"http:\/\/www.apache.org\/dyn\/closer.cgi\/lucene\/solr\">here <\/a>respectively. \u00a0As part of the Lucene project\u2019s effort to do regular releases, 3.5.0 is another solid release providing a handful of new features and bugs. \u00a0The following is a review of the release, focusing on some changes which I in particular found of interest.<\/p>\n<h2>Apache Lucene 3.5.0<\/h2>\n<p>Lucene 3.5.0 has a number of very important fixes and changes to both its core index management and userland APIs:<\/p>\n<ul>\n<li><a href=\"https:\/\/issues.apache.org\/jira\/browse\/LUCENE-2205\">LUCENE-2205<\/a>: Internally Lucene manages a dictionary of terms in its index which is heavily optimized for quick access. However the dictionary can consume a lot of memory, especially when the index holds millions\/billions of unique terms. LUCENE-2205 considerably reduces (3-5x) this memory consumption through a rewrite of the datastructures and classes used to maintain and interact with the dictionary.<\/li>\n<li><a href=\"https:\/\/issues.apache.org\/jira\/browse\/LUCENE-2215\">LUCENE-2215<\/a>: Strangely enough, despite being one common usercases, Lucene has never provided an easy and efficient use for the deep paging API. Instead users have had to use the existing <code>TopDocs <\/code>driven API which is very inefficient when used with large offsets, or have had to roll their own <code>Collector<\/code>. LUCENE-2215 addresses this limitation by adding <code>searchAfter <\/code>methods to <code>IndexSearcher <\/code>which will efficiently find results that come \u2018after\u2019 a provided document in result sets.<\/li>\n<li><a href=\"https:\/\/issues.apache.org\/jira\/browse\/LUCENE-3454\">LUCENE-3454<\/a>: As discussed here, Lucene\u2019s <code>optimize <\/code>index management operation has been renamed to <code>forceMerge<\/code> to clarify the common misunderstanding that the operation is vital. Some users had considered it so vital that they optimized after each document was added. Since 3.5.0 is a minor release, <code>IndexWriter.optimize()<\/code> has only been deprecated however it has been removed from Lucene\u2019s trunk therefore it is recommended that users move over to <code>forceMerge<\/code> where appropriate.<\/li>\n<li><a href=\"https:\/\/issues.apache.org\/jira\/browse\/LUCENE-3445\">LUCENE-3445<\/a>,<a href=\"https:\/\/issues.apache.org\/jira\/browse\/LUCENE-3486\"> LUCENE-3486<\/a>: As part of the effort to provide userland classes with easy to use APIs for managing and interacting Lucene indexes, LUCENE-3445 adds a <code>SearchManager<\/code> which handles the boilerplate code so often written to manager <code>IndexSearchers<\/code> across threads and reopens of underlying <code>IndexReaders<\/code>. LUCENE-3486 goes one step further by adding a <code>SearcherLifetimeManager<\/code> which provides an easy-to-use API for ensuring that users uses the same <code>IndexSearcher<\/code> as they <em>\u2018drill-down\u2019<\/em> or page through results. Interacting with a new <code>IndexSearcher<\/code> during paging can mean the order of results will change resulting in a confusing user experience.<\/li>\n<li><a href=\"https:\/\/issues.apache.org\/jira\/browse\/LUCENE-3426\">LUCENE-3426<\/a>: When using NGrams (for the term \u201cABCD\u201d, the NGrams could be \u201cAB, \u201cBC\u201d, \u201cCD\u201d) and <code>PhraseQuerys<\/code>, the Queries can be optimized by removing any redundant terms (the <code>PhraseQuery<\/code> \u201cAB BC CD\u201d can be reduced to \u201cAB CD\u201d). LUCENE-3426 provides a new <code>NGramPhraseQuery<\/code> which does such optimizations, where possible, on Query rewrite. The benefits, a 30-50% performance improvement in some cases, especially beneficial for CJK users, where NGrams are prevalent.<\/li>\n<\/ul>\n<p>Lucene 3.5.0 of course contains many smaller changes and bug fixes. \u00a0See <a href=\"http:\/\/lucene.apache.org\/java\/3_5_0\/changes\/Changes.html\">here <\/a>for full information about the release.<\/p>\n<h2>Apache Solr 3.5.0<\/h2>\n<p>Benefiting considerably from Lucene 3.5.0, Solr 3.5.0 also contains a handful of useful changes:<\/p>\n<ul>\n<li><a href=\"https:\/\/issues.apache.org\/jira\/browse\/SOLR-2066\">SOLR-2066<\/a>: Continuing to be one of Solr\u2019s most sought after features, the power and flexibility of result grouping continues with SOLR-2066 which adds distributed grouping support. Although coming at a cost of 3 round trips to each shard, SOLR-2066 all but closes the book on what was once considered an extremely difficult feature to add to Solr and sets Solr apart from search system alternatives.<\/li>\n<li><a href=\"https:\/\/issues.apache.org\/jira\/browse\/SOLR-1979\">SOLR-1979<\/a>: When creating a multi-lingual search system, it is often useful to be able to identify the language of a document as it comes into the system. SOLR-1979 adds out-of-box support for this to Solr by adding a <em>langid<\/em> Solr module containing a <code>LanguageIdentifierUpdateProcessor<\/code> which leverages Apache Tika\u2019s language detection abilities. In addition to being able to identify which language a document is, the UpdateProcessor can map data into language specific fields, a common way of supporting documents of different languages in a multi-lingual search system.<\/li>\n<li><a href=\"https:\/\/issues.apache.org\/jira\/browse\/SOLR-2881\">SOLR-2881<\/a>: Is all about sorting Documents with missing values in a field (known as sortMissingLast) improved in Lucene&#8217;s trunk and 3x branch, support for using sortMissingLast with Solr&#8217;s Trie fields has been added. Consequently it is now possible to control whether those Documents with no value in a Trie field appear first or last when sorted.<\/li>\n<li><a href=\"https:\/\/issues.apache.org\/jira\/browse\/SOLR-2769\">SOLR-2769<\/a>: Solr users are now able to use Hunspell for Lucene through the <code>HunspellStemFilterFactory<\/code>. The factory allows the affix and multiple dictionary files to be specified, allowing Solr users to use some of the over 100 <a href=\"http:\/\/wiki.services.openoffice.org\/wiki\/Dictionaries\">Hunspell dictionaries<\/a> used in projects like OpenOffice and Mozilla Firefox in their analysis chain. Very useful for users having to support rarely used languages.<\/li>\n<\/ul>\n<p>Solr 3.5.0 also contains many smaller fixes and changes. \u00a0See the CHANGES.txt for full information about the release.<\/p>\n<h2>Lucene &amp; Solr 3.6.0?<\/h2>\n<p>With changes still being made to the 3x branch of both Lucene and Solr, and the release of Lucene and Solr 4 it is very likely that 3.6.0 will be released in a couple of months time.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Just a little over two weeks ago Apache Lucene and Solr 3.5.0 were released. \u00a0The released artifacts can be found here and here respectively. \u00a0As part of the Lucene project\u2019s effort to do regular releases, 3.5.0 is another solid release providing a handful of new features and bugs. \u00a0The following is a review of the [&hellip;]<\/p>\n","protected":false},"author":24,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"content-type":"","footnotes":""},"categories":[15,65],"tags":[35,33,277],"class_list":["post-6419","post","type-post","status-publish","format-standard","hentry","category-enterprise-search","category-big_data_search","tag-lucene","tag-solr","tag-release"],"acf":[],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v24.4 - https:\/\/yoast.com\/wordpress\/plugins\/seo\/ -->\n<title>Apache Lucene &amp; Solr 3.5.0 - Trifork Blog<\/title>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/trifork.nl\/blog\/apache-lucene-solr-3-5-0\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Apache Lucene &amp; Solr 3.5.0 - Trifork Blog\" \/>\n<meta property=\"og:description\" content=\"Just a little over two weeks ago Apache Lucene and Solr 3.5.0 were released. \u00a0The released artifacts can be found here and here respectively. \u00a0As part of the Lucene project\u2019s effort to do regular releases, 3.5.0 is another solid release providing a handful of new features and bugs. \u00a0The following is a review of the [&hellip;]\" \/>\n<meta property=\"og:url\" content=\"https:\/\/trifork.nl\/blog\/apache-lucene-solr-3-5-0\/\" \/>\n<meta property=\"og:site_name\" content=\"Trifork Blog\" \/>\n<meta property=\"article:published_time\" content=\"2011-12-14T13:35:19+00:00\" \/>\n<meta name=\"author\" content=\"Chris Male\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"Chris Male\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"4 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"WebPage\",\"@id\":\"https:\/\/trifork.nl\/blog\/apache-lucene-solr-3-5-0\/\",\"url\":\"https:\/\/trifork.nl\/blog\/apache-lucene-solr-3-5-0\/\",\"name\":\"Apache Lucene &amp; Solr 3.5.0 - Trifork Blog\",\"isPartOf\":{\"@id\":\"https:\/\/trifork.nl\/blog\/#website\"},\"datePublished\":\"2011-12-14T13:35:19+00:00\",\"author\":{\"@id\":\"https:\/\/trifork.nl\/blog\/#\/schema\/person\/63ca100399079ec6e98e2b4365298806\"},\"breadcrumb\":{\"@id\":\"https:\/\/trifork.nl\/blog\/apache-lucene-solr-3-5-0\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/trifork.nl\/blog\/apache-lucene-solr-3-5-0\/\"]}]},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/trifork.nl\/blog\/apache-lucene-solr-3-5-0\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\/\/trifork.nl\/blog\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"Apache Lucene &amp; Solr 3.5.0\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/trifork.nl\/blog\/#website\",\"url\":\"https:\/\/trifork.nl\/blog\/\",\"name\":\"Trifork Blog\",\"description\":\"Keep updated on the technical solutions Trifork is working on!\",\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/trifork.nl\/blog\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"en-US\"},{\"@type\":\"Person\",\"@id\":\"https:\/\/trifork.nl\/blog\/#\/schema\/person\/63ca100399079ec6e98e2b4365298806\",\"name\":\"Chris Male\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/trifork.nl\/blog\/#\/schema\/person\/image\/\",\"url\":\"https:\/\/secure.gravatar.com\/avatar\/8bbf7706dc08d42eaf11f2b18add0721?s=96&d=mm&r=g\",\"contentUrl\":\"https:\/\/secure.gravatar.com\/avatar\/8bbf7706dc08d42eaf11f2b18add0721?s=96&d=mm&r=g\",\"caption\":\"Chris Male\"},\"url\":\"https:\/\/trifork.nl\/blog\/author\/chris\/\"}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"Apache Lucene &amp; Solr 3.5.0 - Trifork Blog","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/trifork.nl\/blog\/apache-lucene-solr-3-5-0\/","og_locale":"en_US","og_type":"article","og_title":"Apache Lucene &amp; Solr 3.5.0 - Trifork Blog","og_description":"Just a little over two weeks ago Apache Lucene and Solr 3.5.0 were released. \u00a0The released artifacts can be found here and here respectively. \u00a0As part of the Lucene project\u2019s effort to do regular releases, 3.5.0 is another solid release providing a handful of new features and bugs. \u00a0The following is a review of the [&hellip;]","og_url":"https:\/\/trifork.nl\/blog\/apache-lucene-solr-3-5-0\/","og_site_name":"Trifork Blog","article_published_time":"2011-12-14T13:35:19+00:00","author":"Chris Male","twitter_card":"summary_large_image","twitter_misc":{"Written by":"Chris Male","Est. reading time":"4 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"WebPage","@id":"https:\/\/trifork.nl\/blog\/apache-lucene-solr-3-5-0\/","url":"https:\/\/trifork.nl\/blog\/apache-lucene-solr-3-5-0\/","name":"Apache Lucene &amp; Solr 3.5.0 - Trifork Blog","isPartOf":{"@id":"https:\/\/trifork.nl\/blog\/#website"},"datePublished":"2011-12-14T13:35:19+00:00","author":{"@id":"https:\/\/trifork.nl\/blog\/#\/schema\/person\/63ca100399079ec6e98e2b4365298806"},"breadcrumb":{"@id":"https:\/\/trifork.nl\/blog\/apache-lucene-solr-3-5-0\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/trifork.nl\/blog\/apache-lucene-solr-3-5-0\/"]}]},{"@type":"BreadcrumbList","@id":"https:\/\/trifork.nl\/blog\/apache-lucene-solr-3-5-0\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/trifork.nl\/blog\/"},{"@type":"ListItem","position":2,"name":"Apache Lucene &amp; Solr 3.5.0"}]},{"@type":"WebSite","@id":"https:\/\/trifork.nl\/blog\/#website","url":"https:\/\/trifork.nl\/blog\/","name":"Trifork Blog","description":"Keep updated on the technical solutions Trifork is working on!","potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/trifork.nl\/blog\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":"Person","@id":"https:\/\/trifork.nl\/blog\/#\/schema\/person\/63ca100399079ec6e98e2b4365298806","name":"Chris Male","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/trifork.nl\/blog\/#\/schema\/person\/image\/","url":"https:\/\/secure.gravatar.com\/avatar\/8bbf7706dc08d42eaf11f2b18add0721?s=96&d=mm&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/8bbf7706dc08d42eaf11f2b18add0721?s=96&d=mm&r=g","caption":"Chris Male"},"url":"https:\/\/trifork.nl\/blog\/author\/chris\/"}]}},"_links":{"self":[{"href":"https:\/\/trifork.nl\/blog\/wp-json\/wp\/v2\/posts\/6419","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/trifork.nl\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/trifork.nl\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/trifork.nl\/blog\/wp-json\/wp\/v2\/users\/24"}],"replies":[{"embeddable":true,"href":"https:\/\/trifork.nl\/blog\/wp-json\/wp\/v2\/comments?post=6419"}],"version-history":[{"count":0,"href":"https:\/\/trifork.nl\/blog\/wp-json\/wp\/v2\/posts\/6419\/revisions"}],"wp:attachment":[{"href":"https:\/\/trifork.nl\/blog\/wp-json\/wp\/v2\/media?parent=6419"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/trifork.nl\/blog\/wp-json\/wp\/v2\/categories?post=6419"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/trifork.nl\/blog\/wp-json\/wp\/v2\/tags?post=6419"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}