{"id":3244,"date":"2011-05-05T16:38:22","date_gmt":"2011-05-05T14:38:22","guid":{"rendered":"http:\/\/blog.jteam.nl\/?p=3244"},"modified":"2011-05-05T16:38:22","modified_gmt":"2011-05-05T14:38:22","slug":"indexing-your-samba-windows-network-shares-using-solr","status":"publish","type":"post","link":"https:\/\/trifork.nl\/blog\/indexing-your-samba-windows-network-shares-using-solr\/","title":{"rendered":"Indexing your Samba\/Windows network shares using Solr"},"content":{"rendered":"<p>Many of JTeam&#8217;s clients want to search the content of their existing network shares as part of their Enterprise Search infrastructure. Over the last couple of years, more and more people are switching to <a href=\"http:\/\/lucene.apache.org\/solr\/\">Apache Lucene \/ Solr<\/a> as their preferred, open source search solution. However, many still have the misconception that it is not possible to index the content of other enterprise content systems, like Microsoft Sharepoint and Samba \/ Windows shares using Solr. This blog entry will show you how to easily index the content of your network shares reusing some of the components JTeam built to make this really easy to do.<\/p>\n<p><!--more--><br \/>\nIn some current day product selections, Apache Solr is dismissed as a serious option, partly because it does not provide out-of-the-box support for indexing content from non standard repository such as Microsoft Sharepoint and Samba \/ Windows shares. Also, support for scheduling of tasks (needed to periodically index the content from these repositories) is not built into Solr. However, there are several open source solutions available that do exaclty this and are easily combined with a typical Solr installation. The main two open source solutions that can be used to index content from Sharepoint and\/or network shares are <a title=\"Manifold connector framework\" href=\"http:\/\/incubator.apache.org\/connectors\/\" target=\"_blank\" rel=\"noopener\">Manifold Connector Framework<\/a> and <a title=\"Google Enterprise Connector Manager\" href=\"http:\/\/code.google.com\/p\/google-enterprise-connector-manager\/\" target=\"_blank\" rel=\"noopener\">Google Enterprise Connector Manager<\/a>. In this blog I will show you how you can index your files from network shares using Solr and Google enterprise connector manager.<\/p>\n<h2>Connector manager<\/h2>\n<p>The connector manager is the central part of the connector framework for the Google Search Appliance (GSA) and enables searching documents that are stored in non-Web repositories, such as enterprise content management systems, like Microsoft Sharepoint, Samba \/ Windows network shares, but it also supports JDBC databases.<\/p>\n<p>The connector manager is a web application that runs inside a Java Servlet container. The connector manager is the entry point for the creation, instantiation, scheduling and monitoring of connectors that supply content and authentication and authorization services over that content. Interacting with the connector manager is done over HTTP using an XML interface. There is no graphical user interface to do this.<\/p>\n<p>By default, the connector manager is used in conjunction with the Google Search Appliance (GSA). However, we want to use it to index documents to Apache Solr. This is easy to do, by means of a connector manager concept called <code>Pusher<\/code>. A <code>Pusher<\/code> is responsible for sending a crawled document to an external system. By default, the only implementation sends crawled documents to GSA, but implementing your own <code>Pusher<\/code> is not difficult. At JTeam we already created a <code>Pusher<\/code> that sends documents to Solr using the SolrJ client library. This <code>SolrDocPusher<\/code> integrates the connector manager with Solr, by sending every crawled document to Solr. The default implementation of the <code>SolrDocPusher<\/code> sends the documents to the <a href=\"http:\/\/wiki.apache.org\/solr\/ExtractingRequestHandler\">ExtractingRequestHandler<\/a> (a.k.a. Solr Cell) which is part of Solr. The <code>ExtractingRequestHandler<\/code> uses <a href=\"http:\/\/tika.apache.org\/\">Apache Tika<\/a> to extract text content from documents in various file formats.<\/p>\n<h2>Connector<\/h2>\n<p>A connector is the bridge between the repository and the connector manager. A connector knows the details for retrieving content (incl. authentication and authorization) from a specific repository. It implements a connector <a href=\"http:\/\/google-enterprise-connector-manager.googlecode.com\/svn\/docs\/javadoc\/2.0.0\/index.html\" target=\"_blank\" rel=\"noopener\">SPI interface<\/a>, which is the only thing the connector manager know about.<\/p>\n<p>Currently, there are connectors available for all repositories that are supported by GSA, which includes Microsoft sharepoint, file systems, Samba \/ Windows shares, RDBS, but also experimental support for Salesforce. A full list of available connectors can be found <a title=\"More connectors\" href=\"http:\/\/code.google.com\/p\/googlesearchapplianceconnectors\/\" target=\"_blank\" rel=\"noopener\">here<\/a>.<\/p>\n<h2>Traversals<\/h2>\n<p>Traversal is an important concept when working with Google Enterprise connector framework. A traversal is the process that makes sure that all new and changed documents are added to and deleted documents are removed from the indexer (in our case Solr). Each connector uses its own heuristics to determine when a document is new, changed or deleted. A traversal is executed in batches and starts depending on the configured time intervals. After each batch a checkpoint is saved that records the progress of a specific traversal.<\/p>\n<p>In order for a traversal to start and run correctly, the following actions should have been executed:<\/p>\n<ol>\n<li>Connector has to be installed. The jar containing the connector and any dependencies should be on the classpath of the connector manager.<\/li>\n<li>Connector instance has to be configured. Each connector has its own options that have to be specified before crawling can happen. For example, the <code>DataBaseConnector<\/code> needs a query and JDBC details and the <code>FileSystemConnector<\/code> needs a path and optionally account details when a Samba share has to be crawled. You can query the connector manager for the options of a specific connector. In the last section of this blog entry I will show how to do this.<\/li>\n<li>Connector instance has to be scheduled. Scheduling a connector is a separate step. In this step you\u2019ll need to specify the time intervals, load and retry delay time.<\/li>\n<\/ol>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-3259\" title=\"gcf-overview\" alt=\"\" src=\"http:\/\/blog.jteam.nl\/wp-content\/uploads\/2011\/04\/gcf-overview.png\" width=\"727\" height=\"318\" srcset=\"https:\/\/trifork.nl\/blog\/wp-content\/uploads\/sites\/3\/2011\/04\/gcf-overview.png 727w, https:\/\/trifork.nl\/blog\/wp-content\/uploads\/sites\/3\/2011\/04\/gcf-overview-300x131.png 300w\" sizes=\"auto, (max-width: 727px) 100vw, 727px\" \/><\/p>\n<p>The above image shows an overview of the Google connector manager and different traversals. In our case the retrieved documents are pushed to Solr and the configuration \/ scheduling happens over HTTP via the command line using <code>curl<\/code>.<\/p>\n<p>When scheduling a connector instance there are two option that need some more explanation: load and retry delay time. The connector manager has a notion of load per connector instance. This determines the number of documents that can be processed per minute. The retry delay time option is simply the time a connector instance should wait during a traversal when an error has occurred.<\/p>\n<h2>Authenticating and Authorization<\/h2>\n<p>Besides indexing data the connector manager can also assist you with authenticating users and authorizing documents. The connector manager provides web interfaces to perform authentication and authorization. As shown in the image below the connector manager delegates the authentication and authorization requests to one or more connectors.<\/p>\n<p><a href=\"http:\/\/blog.jteam.nl\/wp-content\/uploads\/2011\/05\/authorization-authentication.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-3305\" title=\"authorization-authentication\" alt=\"\" src=\"http:\/\/blog.jteam.nl\/wp-content\/uploads\/2011\/05\/authorization-authentication.png\" width=\"878\" height=\"370\"><\/a><\/p>\n<p>Please note that not all connectors support authentication and authorization features. In this blog I will focus on indexing, but this shows that connector manager is designed with authentication and authorization in mind.<\/p>\n<h2>Using the connector manager with Solr<\/h2>\n<p>The following paragraph shows how to use the connector manager with the filesystem connector. We\u2019ll try to crawl a Samba share. Most of the steps are similar when using different connectors. In this small tutorial I will use Tomcat as Servlet container for running the connector manager, but any other Servlet container should work.<\/p>\n<p>The connector manager\u2019s binary distribution can be downloaded from the <a href=\"http:\/\/code.google.com\/p\/google-enterprise-connector-manager\/downloads\/list\" target=\"_blank\" rel=\"noopener\">Google\u2019s download page<\/a>. The binary distribution contains the connector manager as a war file. However this distribution doesn\u2019t contain any connectors and it relies on the GSA instead of Solr. Adding connectors isn\u2019t a big deal (just put the connector jars on the connector manager classpath). There is even <a href=\"http:\/\/code.google.com\/p\/googlesearchapplianceconnectors\/\">an installer<\/a> for the connector manager with many connectors.<\/p>\n<p>We want the connector manager to work with Apache Solr. This requires some changes to the connector manager\u2019s configuration which is packaged inside the war file and the connector manager needs the <code>SolrDocPusher<\/code>. Obviously the file system connector is also required. To make it easy we have <a href=\"http:\/\/info.trifork.nl\/TriforkConnectormanager.html\" target=\"_blank\" rel=\"noopener\">provided an archive<\/a> with a Tomcat 6 distribution containing Solr version 3.1.0 and the connector manager version 2.6.6 with the required changes.<\/p>\n<p>In this setup Solr in configured under the <a href=\"http:\/\/localhost:8080\/solr\">solr context<\/a> and the connector manager under the <a href=\"http:\/\/localhost:8080\/connector-manager\">connector-manager context<\/a>. The connector manager is configured to send documents to this local Solr instance. In this setup logging is also properly configured. Solr will log to the solr-info file and solr-error file in the Tomcat log directory. The connector manager will log to the gcf-info file and the gcf-error file. The error log files will only contain error messages and the info log files contains the info messages and error messages.<\/p>\n<p>Follow the following steps to index data from your network shares or local disk:<\/p>\n<ol>\n<li>Unpack <a href=\"http:\/\/info.trifork.nl\/TriforkConnectormanager.html\">the archive <\/a>into a convenient location on your local system.<\/li>\n<li>Start Tomcat by running the start script in the bin directory and check via the log files if Solr and the connector manager are running without any errors. Browse to the <a href=\"http:\/\/localhost:8080\/connector-manager\/\" target=\"_blank\" rel=\"noopener\">connector manager<\/a> context and see if the xml element statusId contains the value 0. If that is the case then everything is fine. Any other value means that there is an error (This shouldn&#8217;t happen with the provided distribution). Browse to the <a href=\"http:\/\/localhost:8080\/solr\/\" target=\"_blank\" rel=\"noopener\">Solr<\/a> context and execute the example query in the Solr admin menu. You should see an empty result.<\/li>\n<li>Check if the file system connector is available in the connector manager by going to the following url: http:\/\/localhost:8080\/connector-manager\/getConnectorList<br \/>\nYou should see a response like this in your browser:<br \/>\n<pre class=\"brush: xml; title: ; notranslate\" title=\"\">&lt;p&gt;&lt;\/p&gt;\n&lt;p&gt;  Google Search Appliance Connector Manager 2.6.6 (build 2658  December 7 2010); Sun Microsystems Inc. Java HotSpot(TM) 64-Bit Server VM 1.6.0_22; Linux 2.6.35-28-generic (amd64)&lt;br&gt;\n  0&lt;\/p&gt;\n&lt;p&gt;    FileConnectorType&lt;\/p&gt;\n&lt;p&gt;<\/pre><\/p><\/li>\n<li>To configure a connector instance you\u2019ll need to send a xml snippet with the connector configuration to the connector manager. Take a look at the following connector instance configuration:<br \/>\n<pre class=\"brush: xml; title: ; notranslate\" title=\"\">&lt;p&gt;&lt;\/p&gt;\n&lt;p&gt;    en&lt;br&gt;\n    test&lt;br&gt;\n    FileConnectorType&lt;br&gt;\n    false&lt;\/p&gt;\n&lt;p&gt;<\/pre><\/p>\n<p>The above configuration tells the connector manager to crawl certain paths on a Samba \/ Windows share. The start options tell the file system connector what paths to crawl recursively. Replace [YOUR-PATH] by some path in your network shares and replace [YOUR-HOST] by the hostname or IP of your share server. Also replace [YOUR-USER] and [YOUR-PASSWORD] with a username \/ password combination of a user that is allowed to index files in the specified path(s). The username should be specified without a domain (not like this: domain@username) when crawling a Windows share. Finally if you&#8217;re crawling a Windows share replace [YOUR-DOMAIN] with the domain the share is located in.<\/p>\n<p>The include options and exclude options specify what files should be included for traversal or excluded from traversal. <a href=\"http:\/\/code.google.com\/apis\/searchappliance\/documentation\/60\/admin\/URL_patterns.html\">Url patterns<\/a> are used as syntax to define includes and excludes. The first include option in the above XML snippet specifies to include all files it encounters in the start path(s).<\/p>\n<p>You can also crawl a path on your local file-system. Omit the <code>smb:\/\/<\/code> prefix from the paths in the start configuration option and replace it with a path from your local file system. When crawling files from your local file-system you can omit the credentials and domain configuration options.<\/p>\n<p>Save the above xml snippet as a file on your disk. For example: <code>setConnectorConfig.xml<\/code>. To create the connector instance you&#8217;ll need to send a HTTP post to the following url: http:\/\/localhost:8080\/connector-manager\/setConnectorConfig<\/p>\n<p>A command line utility like curl will do the job:<\/p>\n<p><pre class=\"brush: bash; title: ; notranslate\" title=\"\">curl http:\/\/localhost:8080\/connector-manager\/setConnectorConfig -X POST -d @setConnectorConfig.xml<\/pre><\/p>\n<p>Check if the connector instance is properly configured on the following url: http:\/\/localhost:8080\/connector-manager\/getConnectorInstanceList<br \/>\nYou should see the following xml snippet in your browser:<\/p>\n<p><pre class=\"brush: xml; title: ; notranslate\" title=\"\">&lt;\/p&gt;\n&lt;p&gt;Google Search Appliance Connector Manager 2.6.6 (build 2658  December 7 2010); Sun Microsystems Inc. Java HotSpot(TM) 64-Bit Server VM 1.6.0_22; Linux 2.6.35-28-generic (amd64)&lt;br&gt;\n0&lt;\/p&gt;\n&lt;p&gt;test&lt;br&gt;\nFileConnectorType&lt;br&gt;\n2.6.0 September 28 2010&lt;br&gt;\n0&lt;\/p&gt;\n&lt;p&gt;<\/pre><\/p><\/li>\n<li>The next step is to schedule the connector instance. This is also done by sending a HTTP post. Take a look at the following XML snippet:<br \/>\n<pre class=\"brush: xml; title: ; notranslate\" title=\"\">&lt;p&gt;&lt;\/p&gt;\n&lt;p&gt;    test&lt;br&gt;\n    1000&lt;br&gt;\n    30000&lt;br&gt;\n    0-23&lt;\/p&gt;\n&lt;p&gt;<\/pre><\/p>\n<p>The above XML snippet tells the connector manager that the connector instance with name test should traverse the specified paths in the previous step at all hours of the day. It also specifies the load and timeout. Save the xml snippet to file (for example: setSchedule.xml) and post it with curl to the following url: http:\/\/localhost:8080\/connector-manager\/setSchedule<br \/>\nWith curl it would look like this:<\/p>\n<p><pre class=\"brush: bash; title: ; notranslate\" title=\"\">curl http:\/\/localhost:8080\/connector-manager\/setSchedule -X POST -d @setSchedule.xml<\/pre><\/p>\n<p>If the call was successful you should see the following response in your prompt:<\/p>\n<p><pre class=\"brush: xml; title: ; notranslate\" title=\"\">&lt;\/p&gt;\n&lt;p&gt;  0&lt;\/p&gt;\n&lt;p&gt;<\/pre><\/p>\n<p>The connector manager will now begin to crawl the specified path(s) in step 4. You should see documents being added to your Solr instance.<\/p><\/li>\n<\/ol>\n<p>If you want to remove the connector instance you can do so by executing the following url in for example your browser:<\/p>\n<p> http:\/\/localhost:8080\/connector-manager\/removeConnector?ConnectorName=test <\/p>\n<p>The result is that the crawling stops and loses it state it has maintained. Meaning that if you reconfigure a connector instance with the same name it would start crawling from scratch.<\/p>\n<p>The XML configuration snippet in step 4 seems verbose. The reason you need to send empty properties is that all connector properties must be specified when configuring a connector, otherwise an exception occurs. Sending many empty configuration options seems overkill, but that is just how it works. To find out what options you need to specify for configuring a connector instance. You&#8217;ll need to send a http request to the following url:<br \/>\nhttp:\/\/localhost:8080\/connector-manager\/getConfigForm?ConnectorType=[FileConnectorType]<br \/>\nReplace [FileConnectorType] by the connector you want to configure. This returns a xml response with a html form inside. From this form you can determine the fields to send or you can render this in a front-end application end let an end-user supply the configuration options.<\/p>\n<p>The connector manager can easily be changed to index documents to a different Solr instance. If you want that, you can do so by changing the following file in the connector manager exploded directory: <strong>\/WEB-INF\/applicationContext.properties<\/strong><br \/>\nYou must then change the <strong>solr.baseUrl<\/strong> and <strong>solr.core<\/strong> properties to a different Solr instance.<\/p>\n<p>In Solr you&#8217;ll see that the documents have a number of fields with <strong>google:<\/strong> prefix. The <strong>google:aclgroups<\/strong> field defines which usergroups are allowed the read a specific document. An important note is that the file system connector does not retrieve nested groups. The <strong>google:aclusers<\/strong> field defines which users are allowed to read a document. Usually this are users who don&#8217;t belong to one of the groups in the previous field and have direct read privileges. The <strong>google:ispublic<\/strong> field defines if the document can be viewed by public users. The <strong>content<\/strong> field contains the parsed content of a file. The binary content of a file is parsed by Tika and put in the content field. The <code>ExtractingRequestHandler<\/code> on the Solr side puts all metadate Tika can collect during parsing in fields with the prefix <strong>metadata_<\/strong>. Not all files can be parsed by Tika for several reasons. For example Tika doesn&#8217;t have a parser for that specific content type or an error occurred during parsing. The latter does happen often and the behavior of the <code>SolrDocPusher<\/code> in that case is to send the documents to Solr without parsing of the content with Tika. This means that all the metadata fields and the content fields are empty.<\/p>\n<h2>Conclusion<\/h2>\n<p>Hopefully, this blog entry has shown you that it&#8217;s fairly easy to index your data sources using Apache Solr, with little custom development and more help from Google&#8217;s connector framework. The connector manager and its connectors are a good solution to crawl many data storages. Especially the connectors are typically good pieces of software that are easy to reuse. This blog also shows that open source products from different communities (Apache &amp; Google) can be integrated together with little effort. A GUI for managing and configuring the connectors and the manager would be nice though, but that is one of the features the GSA provides. If you want to have more background information about the connector manager you can checkout my presentation. I gave this presentation at JTeam last February. Or you can always <a href=\"http:\/\/www.trifork.nl\/en\/home\/contact-us.html\" target=\"_blank\" rel=\"noopener\">contact us<\/a> to help you index your data!<\/p>\n\n\n<figure class=\"wp-block-image\"><a href=\"https:\/\/bit.ly\/3BAo305\" target=\"_blank\" rel=\"noreferrer noopener\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"256\" src=\"https:\/\/trifork.nl\/articles\/wp-content\/uploads\/sites\/3\/2022\/02\/Blog-Banner-1-1024x256.png\" alt=\"\" class=\"wp-image-20303\" srcset=\"https:\/\/trifork.nl\/blog\/wp-content\/uploads\/sites\/3\/2022\/02\/Blog-Banner-1-1024x256.png 1024w, https:\/\/trifork.nl\/blog\/wp-content\/uploads\/sites\/3\/2022\/02\/Blog-Banner-1-300x75.png 300w, https:\/\/trifork.nl\/blog\/wp-content\/uploads\/sites\/3\/2022\/02\/Blog-Banner-1-768x192.png 768w, https:\/\/trifork.nl\/blog\/wp-content\/uploads\/sites\/3\/2022\/02\/Blog-Banner-1-1536x384.png 1536w, https:\/\/trifork.nl\/blog\/wp-content\/uploads\/sites\/3\/2022\/02\/Blog-Banner-1-2048x512.png 2048w, https:\/\/trifork.nl\/blog\/wp-content\/uploads\/sites\/3\/2022\/02\/Blog-Banner-1-1920x480.png 1920w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><\/a><\/figure>\n","protected":false},"excerpt":{"rendered":"<p>Many of JTeam&#8217;s clients want to search the content of their existing network shares as part of their Enterprise Search infrastructure. Over the last couple of years, more and more people are switching to Apache Lucene \/ Solr as their preferred, open source search solution. However, many still have the misconception that it is not [&hellip;]<\/p>\n","protected":false},"author":77,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"content-type":"","footnotes":""},"categories":[15],"tags":[35,33,16,254,255,9,256,257],"class_list":["post-3244","post","type-post","status-publish","format-standard","hentry","category-enterprise-search","tag-lucene","tag-solr","tag-enterprise-search","tag-google-connector-framework","tag-microsoft-sharepoint","tag-open-source","tag-samba-share","tag-windows-share"],"acf":[],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v24.4 - https:\/\/yoast.com\/wordpress\/plugins\/seo\/ -->\n<title>Indexing your Samba\/Windows network shares using Solr - Trifork Blog<\/title>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/trifork.nl\/blog\/indexing-your-samba-windows-network-shares-using-solr\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Indexing your Samba\/Windows network shares using Solr - Trifork Blog\" \/>\n<meta property=\"og:description\" content=\"Many of JTeam&#8217;s clients want to search the content of their existing network shares as part of their Enterprise Search infrastructure. Over the last couple of years, more and more people are switching to Apache Lucene \/ Solr as their preferred, open source search solution. However, many still have the misconception that it is not [&hellip;]\" \/>\n<meta property=\"og:url\" content=\"https:\/\/trifork.nl\/blog\/indexing-your-samba-windows-network-shares-using-solr\/\" \/>\n<meta property=\"og:site_name\" content=\"Trifork Blog\" \/>\n<meta property=\"article:published_time\" content=\"2011-05-05T14:38:22+00:00\" \/>\n<meta property=\"og:image\" content=\"http:\/\/blog.jteam.nl\/wp-content\/uploads\/2011\/04\/gcf-overview.png\" \/>\n<meta name=\"author\" content=\"Martijn van Groningen\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"Martijn van Groningen\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"13 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"WebPage\",\"@id\":\"https:\/\/trifork.nl\/blog\/indexing-your-samba-windows-network-shares-using-solr\/\",\"url\":\"https:\/\/trifork.nl\/blog\/indexing-your-samba-windows-network-shares-using-solr\/\",\"name\":\"Indexing your Samba\/Windows network shares using Solr - Trifork Blog\",\"isPartOf\":{\"@id\":\"https:\/\/trifork.nl\/blog\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\/\/trifork.nl\/blog\/indexing-your-samba-windows-network-shares-using-solr\/#primaryimage\"},\"image\":{\"@id\":\"https:\/\/trifork.nl\/blog\/indexing-your-samba-windows-network-shares-using-solr\/#primaryimage\"},\"thumbnailUrl\":\"http:\/\/blog.jteam.nl\/wp-content\/uploads\/2011\/04\/gcf-overview.png\",\"datePublished\":\"2011-05-05T14:38:22+00:00\",\"author\":{\"@id\":\"https:\/\/trifork.nl\/blog\/#\/schema\/person\/72d3e6a70910facfdef86dd93ced0e57\"},\"breadcrumb\":{\"@id\":\"https:\/\/trifork.nl\/blog\/indexing-your-samba-windows-network-shares-using-solr\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/trifork.nl\/blog\/indexing-your-samba-windows-network-shares-using-solr\/\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/trifork.nl\/blog\/indexing-your-samba-windows-network-shares-using-solr\/#primaryimage\",\"url\":\"http:\/\/blog.jteam.nl\/wp-content\/uploads\/2011\/04\/gcf-overview.png\",\"contentUrl\":\"http:\/\/blog.jteam.nl\/wp-content\/uploads\/2011\/04\/gcf-overview.png\"},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/trifork.nl\/blog\/indexing-your-samba-windows-network-shares-using-solr\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\/\/trifork.nl\/blog\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"Indexing your Samba\/Windows network shares using Solr\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/trifork.nl\/blog\/#website\",\"url\":\"https:\/\/trifork.nl\/blog\/\",\"name\":\"Trifork Blog\",\"description\":\"Keep updated on the technical solutions Trifork is working on!\",\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/trifork.nl\/blog\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"en-US\"},{\"@type\":\"Person\",\"@id\":\"https:\/\/trifork.nl\/blog\/#\/schema\/person\/72d3e6a70910facfdef86dd93ced0e57\",\"name\":\"Martijn van Groningen\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/trifork.nl\/blog\/#\/schema\/person\/image\/\",\"url\":\"https:\/\/secure.gravatar.com\/avatar\/505caa844fb66f275a027798c993c363?s=96&d=mm&r=g\",\"contentUrl\":\"https:\/\/secure.gravatar.com\/avatar\/505caa844fb66f275a027798c993c363?s=96&d=mm&r=g\",\"caption\":\"Martijn van Groningen\"},\"url\":\"https:\/\/trifork.nl\/blog\/author\/martijn\/\"}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"Indexing your Samba\/Windows network shares using Solr - Trifork Blog","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/trifork.nl\/blog\/indexing-your-samba-windows-network-shares-using-solr\/","og_locale":"en_US","og_type":"article","og_title":"Indexing your Samba\/Windows network shares using Solr - Trifork Blog","og_description":"Many of JTeam&#8217;s clients want to search the content of their existing network shares as part of their Enterprise Search infrastructure. Over the last couple of years, more and more people are switching to Apache Lucene \/ Solr as their preferred, open source search solution. However, many still have the misconception that it is not [&hellip;]","og_url":"https:\/\/trifork.nl\/blog\/indexing-your-samba-windows-network-shares-using-solr\/","og_site_name":"Trifork Blog","article_published_time":"2011-05-05T14:38:22+00:00","og_image":[{"url":"http:\/\/blog.jteam.nl\/wp-content\/uploads\/2011\/04\/gcf-overview.png","type":"","width":"","height":""}],"author":"Martijn van Groningen","twitter_card":"summary_large_image","twitter_misc":{"Written by":"Martijn van Groningen","Est. reading time":"13 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"WebPage","@id":"https:\/\/trifork.nl\/blog\/indexing-your-samba-windows-network-shares-using-solr\/","url":"https:\/\/trifork.nl\/blog\/indexing-your-samba-windows-network-shares-using-solr\/","name":"Indexing your Samba\/Windows network shares using Solr - Trifork Blog","isPartOf":{"@id":"https:\/\/trifork.nl\/blog\/#website"},"primaryImageOfPage":{"@id":"https:\/\/trifork.nl\/blog\/indexing-your-samba-windows-network-shares-using-solr\/#primaryimage"},"image":{"@id":"https:\/\/trifork.nl\/blog\/indexing-your-samba-windows-network-shares-using-solr\/#primaryimage"},"thumbnailUrl":"http:\/\/blog.jteam.nl\/wp-content\/uploads\/2011\/04\/gcf-overview.png","datePublished":"2011-05-05T14:38:22+00:00","author":{"@id":"https:\/\/trifork.nl\/blog\/#\/schema\/person\/72d3e6a70910facfdef86dd93ced0e57"},"breadcrumb":{"@id":"https:\/\/trifork.nl\/blog\/indexing-your-samba-windows-network-shares-using-solr\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/trifork.nl\/blog\/indexing-your-samba-windows-network-shares-using-solr\/"]}]},{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/trifork.nl\/blog\/indexing-your-samba-windows-network-shares-using-solr\/#primaryimage","url":"http:\/\/blog.jteam.nl\/wp-content\/uploads\/2011\/04\/gcf-overview.png","contentUrl":"http:\/\/blog.jteam.nl\/wp-content\/uploads\/2011\/04\/gcf-overview.png"},{"@type":"BreadcrumbList","@id":"https:\/\/trifork.nl\/blog\/indexing-your-samba-windows-network-shares-using-solr\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/trifork.nl\/blog\/"},{"@type":"ListItem","position":2,"name":"Indexing your Samba\/Windows network shares using Solr"}]},{"@type":"WebSite","@id":"https:\/\/trifork.nl\/blog\/#website","url":"https:\/\/trifork.nl\/blog\/","name":"Trifork Blog","description":"Keep updated on the technical solutions Trifork is working on!","potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/trifork.nl\/blog\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":"Person","@id":"https:\/\/trifork.nl\/blog\/#\/schema\/person\/72d3e6a70910facfdef86dd93ced0e57","name":"Martijn van Groningen","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/trifork.nl\/blog\/#\/schema\/person\/image\/","url":"https:\/\/secure.gravatar.com\/avatar\/505caa844fb66f275a027798c993c363?s=96&d=mm&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/505caa844fb66f275a027798c993c363?s=96&d=mm&r=g","caption":"Martijn van Groningen"},"url":"https:\/\/trifork.nl\/blog\/author\/martijn\/"}]}},"_links":{"self":[{"href":"https:\/\/trifork.nl\/blog\/wp-json\/wp\/v2\/posts\/3244","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/trifork.nl\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/trifork.nl\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/trifork.nl\/blog\/wp-json\/wp\/v2\/users\/77"}],"replies":[{"embeddable":true,"href":"https:\/\/trifork.nl\/blog\/wp-json\/wp\/v2\/comments?post=3244"}],"version-history":[{"count":0,"href":"https:\/\/trifork.nl\/blog\/wp-json\/wp\/v2\/posts\/3244\/revisions"}],"wp:attachment":[{"href":"https:\/\/trifork.nl\/blog\/wp-json\/wp\/v2\/media?parent=3244"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/trifork.nl\/blog\/wp-json\/wp\/v2\/categories?post=3244"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/trifork.nl\/blog\/wp-json\/wp\/v2\/tags?post=3244"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}