{"id":2957,"date":"2011-02-10T16:08:20","date_gmt":"2011-02-10T15:08:20","guid":{"rendered":"http:\/\/blog.jteam.nl\/?p=2957"},"modified":"2011-02-10T16:08:20","modified_gmt":"2011-02-10T15:08:20","slug":"mahout-at-fosdem-2011-datadevroom","status":"publish","type":"post","link":"https:\/\/trifork.nl\/blog\/mahout-at-fosdem-2011-datadevroom\/","title":{"rendered":"Mahout at FOSDEM 2011 DataDevRoom"},"content":{"rendered":"<p>Last saturday, february 5th, FOSDEM 2011 hosted the DataDevRoom where talks were given on topics surrounding data analysis with free and open source software. I was there and gave an introductory talk on clustering with Apache Mahout. In case you missed the conference, read on to learn about some of the talks or checkout the slides or demo code from my Mahout talk.<\/p>\n<p><!--more--><\/p>\n<h2>Highlights<\/h2>\n<p>The DataDevRoom was packed! There were 60 chairs available but I estimate that our group had around 100 people most of the time during the day. Everyone had really interesting stuff to talk about, with subjects ranging from NoSQL database benchmarks, to case studies, to introductions on all sort of free or open source tools for processing, analyzing and visualizing data.<\/p>\n<p>I found the <a href=\"http:\/\/www.seeks-project.info\/site\/\">Seeks<\/a> talk by Emmanual Benazara and the <a href=\"http:\/\/s4.io\/\">S4<\/a> talk by Micha&euml;l Figui&egrave;re particularly interesting. Seeks is a social search engine where users&#8217;  queries and search results are shared to improve the search experience. It also has nice features such as clustering  and recommendation of search results. The Lucene and S4 talk discussed how to create a search engine by combining Lucene with Yahoo&#8217;s recently released distributed stream platform S4. S4&#8217;s stream processing was presented as an alternative to large scale batch processing and looks promising.<\/p>\n<h2>Links<\/h2>\n<p>Below are links to the DataDevRoom program as well as my slides and demo code. My demo consisted of using Apache Mahout to cluster the transcripts of Seinfeld episodes.<\/p>\n<ul>\n<li><a href=\"http:\/\/datadevroom.couch.it\/Data_Devroom_Program\">http:\/\/datadevroom.couch.it\/Data_Devroom_Program<\/a><\/li>\n<li><a href=\"http:\/\/info.jteam.nl\/FOSDEM2011-Introduction-to-clustering-with-Mahout-ContentRequest.html\">Slides<\/a><\/li>\n<li>Check out the <a href=\"http:\/\/github.com\/frankscholten\/mahout\/tree\/seinfeld_demo\">seinfeld_demo<\/a> branch at my <a href=\"http:\/\/github.com\/frankscholten\/mahout\">GitHub<\/a> repository.<\/li>\n<\/ul>\n<p>To wrap up, I want to thank the organizers of the DataDevRoom: Olivier Grisel, Nicolas Maillot and Isabel Drost for arranging everything. I had a good time and learned a lot!<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Last saturday, february 5th, FOSDEM 2011 hosted the DataDevRoom where talks were given on topics surrounding data analysis with free and open source software. I was there and gave an introductory talk on clustering with Apache Mahout. In case you missed the conference, read on to learn about some of the talks or checkout the [&hellip;]<\/p>\n","protected":false},"author":6,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"content-type":"","footnotes":""},"categories":[40,12,31,10],"tags":[156,14,226,11,9],"class_list":["post-2957","post","type-post","status-publish","format-standard","hentry","category-mahout","category-conference","category-java","category-development","tag-mahout","tag-conference","tag-data","tag-java","tag-open-source"],"acf":[],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v24.4 - https:\/\/yoast.com\/wordpress\/plugins\/seo\/ -->\n<title>Mahout at FOSDEM 2011 DataDevRoom - Trifork Blog<\/title>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/trifork.nl\/blog\/mahout-at-fosdem-2011-datadevroom\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Mahout at FOSDEM 2011 DataDevRoom - Trifork Blog\" \/>\n<meta property=\"og:description\" content=\"Last saturday, february 5th, FOSDEM 2011 hosted the DataDevRoom where talks were given on topics surrounding data analysis with free and open source software. I was there and gave an introductory talk on clustering with Apache Mahout. In case you missed the conference, read on to learn about some of the talks or checkout the [&hellip;]\" \/>\n<meta property=\"og:url\" content=\"https:\/\/trifork.nl\/blog\/mahout-at-fosdem-2011-datadevroom\/\" \/>\n<meta property=\"og:site_name\" content=\"Trifork Blog\" \/>\n<meta property=\"article:published_time\" content=\"2011-02-10T15:08:20+00:00\" \/>\n<meta name=\"author\" content=\"frank\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"frank\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"1 minute\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"WebPage\",\"@id\":\"https:\/\/trifork.nl\/blog\/mahout-at-fosdem-2011-datadevroom\/\",\"url\":\"https:\/\/trifork.nl\/blog\/mahout-at-fosdem-2011-datadevroom\/\",\"name\":\"Mahout at FOSDEM 2011 DataDevRoom - Trifork Blog\",\"isPartOf\":{\"@id\":\"https:\/\/trifork.nl\/blog\/#website\"},\"datePublished\":\"2011-02-10T15:08:20+00:00\",\"author\":{\"@id\":\"https:\/\/trifork.nl\/blog\/#\/schema\/person\/00fad6c5829f6770345f23ccace2e54f\"},\"breadcrumb\":{\"@id\":\"https:\/\/trifork.nl\/blog\/mahout-at-fosdem-2011-datadevroom\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/trifork.nl\/blog\/mahout-at-fosdem-2011-datadevroom\/\"]}]},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/trifork.nl\/blog\/mahout-at-fosdem-2011-datadevroom\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\/\/trifork.nl\/blog\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"Mahout at FOSDEM 2011 DataDevRoom\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/trifork.nl\/blog\/#website\",\"url\":\"https:\/\/trifork.nl\/blog\/\",\"name\":\"Trifork Blog\",\"description\":\"Keep updated on the technical solutions Trifork is working on!\",\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/trifork.nl\/blog\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"en-US\"},{\"@type\":\"Person\",\"@id\":\"https:\/\/trifork.nl\/blog\/#\/schema\/person\/00fad6c5829f6770345f23ccace2e54f\",\"name\":\"frank\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/trifork.nl\/blog\/#\/schema\/person\/image\/\",\"url\":\"https:\/\/secure.gravatar.com\/avatar\/5c39a948f2b70fa900b25dc79cde8643?s=96&d=mm&r=g\",\"contentUrl\":\"https:\/\/secure.gravatar.com\/avatar\/5c39a948f2b70fa900b25dc79cde8643?s=96&d=mm&r=g\",\"caption\":\"frank\"},\"url\":\"https:\/\/trifork.nl\/blog\/author\/frank\/\"}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"Mahout at FOSDEM 2011 DataDevRoom - Trifork Blog","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/trifork.nl\/blog\/mahout-at-fosdem-2011-datadevroom\/","og_locale":"en_US","og_type":"article","og_title":"Mahout at FOSDEM 2011 DataDevRoom - Trifork Blog","og_description":"Last saturday, february 5th, FOSDEM 2011 hosted the DataDevRoom where talks were given on topics surrounding data analysis with free and open source software. I was there and gave an introductory talk on clustering with Apache Mahout. In case you missed the conference, read on to learn about some of the talks or checkout the [&hellip;]","og_url":"https:\/\/trifork.nl\/blog\/mahout-at-fosdem-2011-datadevroom\/","og_site_name":"Trifork Blog","article_published_time":"2011-02-10T15:08:20+00:00","author":"frank","twitter_card":"summary_large_image","twitter_misc":{"Written by":"frank","Est. reading time":"1 minute"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"WebPage","@id":"https:\/\/trifork.nl\/blog\/mahout-at-fosdem-2011-datadevroom\/","url":"https:\/\/trifork.nl\/blog\/mahout-at-fosdem-2011-datadevroom\/","name":"Mahout at FOSDEM 2011 DataDevRoom - Trifork Blog","isPartOf":{"@id":"https:\/\/trifork.nl\/blog\/#website"},"datePublished":"2011-02-10T15:08:20+00:00","author":{"@id":"https:\/\/trifork.nl\/blog\/#\/schema\/person\/00fad6c5829f6770345f23ccace2e54f"},"breadcrumb":{"@id":"https:\/\/trifork.nl\/blog\/mahout-at-fosdem-2011-datadevroom\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/trifork.nl\/blog\/mahout-at-fosdem-2011-datadevroom\/"]}]},{"@type":"BreadcrumbList","@id":"https:\/\/trifork.nl\/blog\/mahout-at-fosdem-2011-datadevroom\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/trifork.nl\/blog\/"},{"@type":"ListItem","position":2,"name":"Mahout at FOSDEM 2011 DataDevRoom"}]},{"@type":"WebSite","@id":"https:\/\/trifork.nl\/blog\/#website","url":"https:\/\/trifork.nl\/blog\/","name":"Trifork Blog","description":"Keep updated on the technical solutions Trifork is working on!","potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/trifork.nl\/blog\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":"Person","@id":"https:\/\/trifork.nl\/blog\/#\/schema\/person\/00fad6c5829f6770345f23ccace2e54f","name":"frank","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/trifork.nl\/blog\/#\/schema\/person\/image\/","url":"https:\/\/secure.gravatar.com\/avatar\/5c39a948f2b70fa900b25dc79cde8643?s=96&d=mm&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/5c39a948f2b70fa900b25dc79cde8643?s=96&d=mm&r=g","caption":"frank"},"url":"https:\/\/trifork.nl\/blog\/author\/frank\/"}]}},"_links":{"self":[{"href":"https:\/\/trifork.nl\/blog\/wp-json\/wp\/v2\/posts\/2957","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/trifork.nl\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/trifork.nl\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/trifork.nl\/blog\/wp-json\/wp\/v2\/users\/6"}],"replies":[{"embeddable":true,"href":"https:\/\/trifork.nl\/blog\/wp-json\/wp\/v2\/comments?post=2957"}],"version-history":[{"count":0,"href":"https:\/\/trifork.nl\/blog\/wp-json\/wp\/v2\/posts\/2957\/revisions"}],"wp:attachment":[{"href":"https:\/\/trifork.nl\/blog\/wp-json\/wp\/v2\/media?parent=2957"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/trifork.nl\/blog\/wp-json\/wp\/v2\/categories?post=2957"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/trifork.nl\/blog\/wp-json\/wp\/v2\/tags?post=2957"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}