{"id":6421,"date":"2011-12-22T14:00:54","date_gmt":"2011-12-22T13:00:54","guid":{"rendered":"http:\/\/blog.trifork.nl\/?p=6421"},"modified":"2011-12-22T14:00:54","modified_gmt":"2011-12-22T13:00:54","slug":"apache-whirr-includes-mahout-support","status":"publish","type":"post","link":"https:\/\/trifork.nl\/blog\/apache-whirr-includes-mahout-support\/","title":{"rendered":"Apache Whirr includes Mahout support"},"content":{"rendered":"<p>In a <a href=\"http:\/\/blog.trifork.nl\/2011\/06\/21\/running-mahout-in-the-cloud-using-apache-whirr\">previous blog<\/a> I showed you how to use Apache Whirr to launch a Hadoop cluster in order to run Mahout jobs. This blog shows you how to use the Mahout service from the brand new <a href=\"http:\/\/www.cloudsoftcorp.com\/news\/apache-whirr-0-7-0-released\/\">Whirr 0.7.0 release<\/a> to automatically install Hadoop and the Mahout binary distribution on a cloud provider such as Amazon.<\/p>\n<h2>Introduction<\/h2>\n<p>If you are new to Apache Whirr checkout my previous <a href=\"http:\/\/blog.trifork.nl\/2011\/06\/21\/running-mahout-in-the-cloud-using-apache-whirr\">blog<\/a> which covers Whirr 0.4.0. A lot has changed since then. After several services, bug fixes, improvements Whirr became a top level Apache project with its new version 0.7.0 released yesterday! During the last weeks I worked on a <a href=\"http:\/\/mahout.apache.org\">Apache Mahout<\/a> service for Whirr included in the latest release. (Thanks to the Whirr community and Andrei Savu in particular for reviewing the code and helping out to ship this cool feature!)<\/p>\n<h2>How to use the Mahout service<\/h2>\n<p>The Mahout service in Whirr defines the <strong>mahout-client<\/strong> role. This role will install the binary Mahout distribution on a given node. To use this feature checkout the sources from <strong><span class=\"st\">https:\/\/svn.apache.org\/repos\/asf\/whirr\/trunk <\/span><\/strong><span class=\"st\">or <\/span><strong>http:\/\/svn.apache.org\/repos\/asf\/whirr\/tags\/release-0.7.0\/<\/strong><span class=\"st\"> or clone the project with Git at <\/span><strong>http:\/\/git.apache.org\/whirr.git <\/strong>and build it with a <em>mvn clean<\/em> <em>install<\/em>. Let me walk you through an example how to use this on Amazon AWS.<\/p>\n<h3>Step 1 Create a node template<\/h3>\n<p>Create a file called <em>mahout-cluster.properties<\/em> and add the following<\/p>\n<p><code><span class=\"na\">whirr.instance-templates<\/span><span class=\"o\">=<\/span><span class=\"s\">1 hadoop-jobtracker+hadoop-namenode+mahout-client,2 hadoop-datanode+hadoop-tasktracker<\/span><\/code><\/p>\n<div class=\"line\" id=\"LC25\"><code><span class=\"na\">whirr.provider<\/span><span class=\"o\">=<\/span><span class=\"s\">aws-ec2<\/span><\/code><\/div>\n<div class=\"line\" id=\"LC26\"><code><span class=\"na\">whirr.identity<\/span><span class=\"o\">=<\/span><span class=\"s\">TOP_SECRET<\/span><\/code><\/div>\n<div class=\"line\" id=\"LC27\"><code><span class=\"na\">whirr.credential<\/span><span class=\"o\">=<\/span><span class=\"s\">TOP_SECRET<\/span><\/code><\/div>\n<div class=\"line\"><\/div>\n<div class=\"line\">\n<p>This setup configures two Hadoop datanode \/ tasktrackers and one Hadoop namenode \/ jobtracker \/ mahout-client node. For the <strong>mahout- client<\/strong> role, Whirr will:<\/p>\n<p>* Download the binary distribution from Apache and install it under <strong>\/usr\/local\/mahout<\/strong><\/p>\n<p>* Set <strong>MAHOUT_HOME<\/strong> to <strong>\/usr\/local\/mahout<\/strong><\/p>\n<p>* Add <strong>$MAHOUT_HOME\/bin<\/strong> to the <strong>PATH<\/strong><\/p>\n<h3>(Optional) Configure the Mahout version and \/ or distribution url<\/h3>\n<\/div>\n<div class=\"line\">By default, Whirr will download the Mahout distribution from<\/div>\n<div class=\"line\"><\/div>\n<div class=\"line\"><code><span class=\"s\">http:\/\/archive.apache.org\/dist\/mahout\/0.5\/<\/span><\/code><code><span class=\"s\">mahout-distribution-0.5.tar.gz<\/span><\/code><\/div>\n<div class=\"line\"><\/div>\n<div class=\"line\">You can override the version by adding<\/div>\n<div class=\"line\"><\/div>\n<div class=\"line\">\n<div class=\"line\" id=\"LC22\"><code><span class=\"s\">whirr.mahout.version<\/span><\/code><code><span class=\"s\">=VERSION<\/span><\/code><\/div>\n<div class=\"line\"><\/div>\n<\/div>\n<p>Also, you can change the download url entirely; useful if you want to test your own version of Mahout. To do so, first create a Mahout binary distribution by entering the mahout distribution folder in your checked out Mahout source tree and run<\/p>\n<p><code>$ mvn clean install -Dskip.mahout.distribution=false<\/code><\/p>\n<p>Now put the tarball on a server that will be accessible by the cluster and add the following line to your mahout-cluster.properties<\/p>\n<p><code><span class=\"na\">whirr.mahout.tarball.url<\/span><span class=\"o\">=MAHOUT_TARBALL_URL<\/span><\/code><\/p>\n<h3>Step 2 Launch the cluster<\/h3>\n<p>You can now launch the cluster the regular way by running:<\/p>\n<p><code>$ whirr launch-cluster --config mahout-cluster.properties<\/code><\/p>\n<h3>Step 3 Login &amp; run<\/h3>\n<p>When the cluster is setup, run the Hadoop proxy, upload some data, SSH into the node and voil<span class=\"st\"><em>\u00e0<\/em><\/span>, you can run Mahout jobs by invoking the command line script like you would do normally, such as:<\/p>\n<p><code>$ mahout seqdirectory --input input --output output <\/code><\/p>\n<p>Enjoy!<\/p>\n","protected":false},"excerpt":{"rendered":"<p>In a previous blog I showed you how to use Apache Whirr to launch a Hadoop cluster in order to run Mahout jobs. This blog shows you how to use the Mahout service from the brand new Whirr 0.7.0 release to automatically install Hadoop and the Mahout binary distribution on a cloud provider such as [&hellip;]<\/p>\n","protected":false},"author":6,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"content-type":"","footnotes":""},"categories":[40,15,10],"tags":[156,11,253],"class_list":["post-6421","post","type-post","status-publish","format-standard","hentry","category-mahout","category-enterprise-search","category-development","tag-mahout","tag-java","tag-whirr"],"acf":[],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v24.4 - https:\/\/yoast.com\/wordpress\/plugins\/seo\/ -->\n<title>Apache Whirr includes Mahout support - Trifork Blog<\/title>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/trifork.nl\/blog\/apache-whirr-includes-mahout-support\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Apache Whirr includes Mahout support - Trifork Blog\" \/>\n<meta property=\"og:description\" content=\"In a previous blog I showed you how to use Apache Whirr to launch a Hadoop cluster in order to run Mahout jobs. This blog shows you how to use the Mahout service from the brand new Whirr 0.7.0 release to automatically install Hadoop and the Mahout binary distribution on a cloud provider such as [&hellip;]\" \/>\n<meta property=\"og:url\" content=\"https:\/\/trifork.nl\/blog\/apache-whirr-includes-mahout-support\/\" \/>\n<meta property=\"og:site_name\" content=\"Trifork Blog\" \/>\n<meta property=\"article:published_time\" content=\"2011-12-22T13:00:54+00:00\" \/>\n<meta name=\"author\" content=\"frank\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"frank\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"2 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"WebPage\",\"@id\":\"https:\/\/trifork.nl\/blog\/apache-whirr-includes-mahout-support\/\",\"url\":\"https:\/\/trifork.nl\/blog\/apache-whirr-includes-mahout-support\/\",\"name\":\"Apache Whirr includes Mahout support - Trifork Blog\",\"isPartOf\":{\"@id\":\"https:\/\/trifork.nl\/blog\/#website\"},\"datePublished\":\"2011-12-22T13:00:54+00:00\",\"author\":{\"@id\":\"https:\/\/trifork.nl\/blog\/#\/schema\/person\/00fad6c5829f6770345f23ccace2e54f\"},\"breadcrumb\":{\"@id\":\"https:\/\/trifork.nl\/blog\/apache-whirr-includes-mahout-support\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/trifork.nl\/blog\/apache-whirr-includes-mahout-support\/\"]}]},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/trifork.nl\/blog\/apache-whirr-includes-mahout-support\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\/\/trifork.nl\/blog\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"Apache Whirr includes Mahout support\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/trifork.nl\/blog\/#website\",\"url\":\"https:\/\/trifork.nl\/blog\/\",\"name\":\"Trifork Blog\",\"description\":\"Keep updated on the technical solutions Trifork is working on!\",\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/trifork.nl\/blog\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"en-US\"},{\"@type\":\"Person\",\"@id\":\"https:\/\/trifork.nl\/blog\/#\/schema\/person\/00fad6c5829f6770345f23ccace2e54f\",\"name\":\"frank\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/trifork.nl\/blog\/#\/schema\/person\/image\/\",\"url\":\"https:\/\/secure.gravatar.com\/avatar\/5c39a948f2b70fa900b25dc79cde8643?s=96&d=mm&r=g\",\"contentUrl\":\"https:\/\/secure.gravatar.com\/avatar\/5c39a948f2b70fa900b25dc79cde8643?s=96&d=mm&r=g\",\"caption\":\"frank\"},\"url\":\"https:\/\/trifork.nl\/blog\/author\/frank\/\"}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"Apache Whirr includes Mahout support - Trifork Blog","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/trifork.nl\/blog\/apache-whirr-includes-mahout-support\/","og_locale":"en_US","og_type":"article","og_title":"Apache Whirr includes Mahout support - Trifork Blog","og_description":"In a previous blog I showed you how to use Apache Whirr to launch a Hadoop cluster in order to run Mahout jobs. This blog shows you how to use the Mahout service from the brand new Whirr 0.7.0 release to automatically install Hadoop and the Mahout binary distribution on a cloud provider such as [&hellip;]","og_url":"https:\/\/trifork.nl\/blog\/apache-whirr-includes-mahout-support\/","og_site_name":"Trifork Blog","article_published_time":"2011-12-22T13:00:54+00:00","author":"frank","twitter_card":"summary_large_image","twitter_misc":{"Written by":"frank","Est. reading time":"2 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"WebPage","@id":"https:\/\/trifork.nl\/blog\/apache-whirr-includes-mahout-support\/","url":"https:\/\/trifork.nl\/blog\/apache-whirr-includes-mahout-support\/","name":"Apache Whirr includes Mahout support - Trifork Blog","isPartOf":{"@id":"https:\/\/trifork.nl\/blog\/#website"},"datePublished":"2011-12-22T13:00:54+00:00","author":{"@id":"https:\/\/trifork.nl\/blog\/#\/schema\/person\/00fad6c5829f6770345f23ccace2e54f"},"breadcrumb":{"@id":"https:\/\/trifork.nl\/blog\/apache-whirr-includes-mahout-support\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/trifork.nl\/blog\/apache-whirr-includes-mahout-support\/"]}]},{"@type":"BreadcrumbList","@id":"https:\/\/trifork.nl\/blog\/apache-whirr-includes-mahout-support\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/trifork.nl\/blog\/"},{"@type":"ListItem","position":2,"name":"Apache Whirr includes Mahout support"}]},{"@type":"WebSite","@id":"https:\/\/trifork.nl\/blog\/#website","url":"https:\/\/trifork.nl\/blog\/","name":"Trifork Blog","description":"Keep updated on the technical solutions Trifork is working on!","potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/trifork.nl\/blog\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":"Person","@id":"https:\/\/trifork.nl\/blog\/#\/schema\/person\/00fad6c5829f6770345f23ccace2e54f","name":"frank","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/trifork.nl\/blog\/#\/schema\/person\/image\/","url":"https:\/\/secure.gravatar.com\/avatar\/5c39a948f2b70fa900b25dc79cde8643?s=96&d=mm&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/5c39a948f2b70fa900b25dc79cde8643?s=96&d=mm&r=g","caption":"frank"},"url":"https:\/\/trifork.nl\/blog\/author\/frank\/"}]}},"_links":{"self":[{"href":"https:\/\/trifork.nl\/blog\/wp-json\/wp\/v2\/posts\/6421","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/trifork.nl\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/trifork.nl\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/trifork.nl\/blog\/wp-json\/wp\/v2\/users\/6"}],"replies":[{"embeddable":true,"href":"https:\/\/trifork.nl\/blog\/wp-json\/wp\/v2\/comments?post=6421"}],"version-history":[{"count":0,"href":"https:\/\/trifork.nl\/blog\/wp-json\/wp\/v2\/posts\/6421\/revisions"}],"wp:attachment":[{"href":"https:\/\/trifork.nl\/blog\/wp-json\/wp\/v2\/media?parent=6421"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/trifork.nl\/blog\/wp-json\/wp\/v2\/categories?post=6421"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/trifork.nl\/blog\/wp-json\/wp\/v2\/tags?post=6421"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}