{"id":14983,"date":"2017-02-16T11:06:28","date_gmt":"2017-02-16T10:06:28","guid":{"rendered":"https:\/\/blog.trifork.com\/?p=14983"},"modified":"2017-02-16T11:06:28","modified_gmt":"2017-02-16T10:06:28","slug":"machine-learning-predicting-house-prices","status":"publish","type":"post","link":"https:\/\/trifork.nl\/blog\/machine-learning-predicting-house-prices\/","title":{"rendered":"Machine Learning: Predicting house prices"},"content":{"rendered":"<p>Recently I have followed an online course on <a href=\"http:\/\/www.coursera.org\/learn\/machine-learning\" target=\"_blank\" rel=\"nofollow noopener\">machine learning<\/a> to understand the current hype better. As with any subject though, only practice makes perfect, so i was looking to apply this new knowledge.<\/p>\n<p>While looking to sell my house I found that would be a nice opportunity: Check if the prices a real estate agents estimates are in line with what the data suggests.<\/p>\n<p>Linear regression algorithm should be a nice algorithm here, this algorithm will try to find the best linear prediction (y = a + b<em>x1 + c<\/em>x2 ; y = prediction, x1,x2 = variables). So, for example, this algorithm can estimate a price per square meter floor space or price per square meter of garden. For a more detailed explanation, check out the <a href=\"http:\/\/en.wikipedia.org\/wiki\/Linear_regression\" target=\"_blank\" rel=\"nofollow noopener\">wikipedia<\/a> page.<\/p>\n<p>In the Netherlands <a href=\"http:\/\/www.funda.nl\/\" target=\"_blank\" rel=\"nofollow noopener\">funda<\/a> is the main website for selling your house, so I have started by collecting some data, I\u00a0used data on the 50 houses closest to my house. I&#8217;ve excluded apartments to try and limit data to properties similar to my house. For each house I\u00a0collected the advertised price, usable floor space, lot size, number of (bed)rooms, type of house (row-house, corner-house, or detached) and year of construction (..-1930, 1931-1940, 1941-1950, 1950-1960, etc). These are the (easily available) variables I expected would influence house price the most. Type of house is a categorical variable, to use that in regression I\u00a0modelled them as several binary (0\/1) variables.<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" width=\"768\" height=\"236\" class=\"wp-image-14984 aligncenter\" src=\"https:\/\/trifork.nl\/articles\/wp-content\/uploads\/sites\/3\/2017\/02\/AAEAAQAAAAAAAAngAAAAJGMzYmZkYjEzLWY4MmUtNGIwMS05NTU3LTdhNmZiNDMzMmEyMA.png\" srcset=\"https:\/\/trifork.nl\/blog\/wp-content\/uploads\/sites\/3\/2017\/02\/AAEAAQAAAAAAAAngAAAAJGMzYmZkYjEzLWY4MmUtNGIwMS05NTU3LTdhNmZiNDMzMmEyMA.png 768w, https:\/\/trifork.nl\/blog\/wp-content\/uploads\/sites\/3\/2017\/02\/AAEAAQAAAAAAAAngAAAAJGMzYmZkYjEzLWY4MmUtNGIwMS05NTU3LTdhNmZiNDMzMmEyMA-300x92.png 300w\" sizes=\"auto, (max-width: 768px) 100vw, 768px\" \/><\/p>\n<p>As preparation, I checked for relations between the variables using correlation. This showed me that much of the collected data does not seem to affect price: Only the floor space, lot size and number of rooms showed a significant correlation with house price.<\/p>\n<p>For the regression analysis, I\u00a0only used the variables that had a significant correlation. Variables without correlation would not produce meaningful results anyway.<\/p>\n<p><!--more-->I&#8217;m using python with the sklearn library to apply the linear regression model to my data:<\/p>\n<p><code><br \/>\nimport pandas, sklearn<br \/>\ndata = pandas.read_csv('houses.csv')<br \/>\nmodel = sklearn.linear_model.LinearRegression()<br \/>\nregr.fit(data.as_matrix(['FloorSpace']), data.as_matrix(['Price']))<br \/>\nprint 'Price of a 90 sq meter house %f' % model.predict([[90]])[0][0]<br \/>\n<\/code><\/p>\n<p>With the sklearn library you can also compute cross validation scores. This tells you how well your model approximates the data; can help you decide between different models; or can also help to chose which variables to use for your prediction model. The code below trains and evaluates the model 10 times with a 90\/10 split of the data and prints the average error.<\/p>\n<p><code><br \/>\nimport pandas, sklearn<br \/>\ndata = pandas.read_csv('houses.csv')<br \/>\nvalidation_scores = cross_val_score(sklearn.linear_model.LinearRegression(),<br \/>\ndata.as_matrix(['FloorSpace', 'Rooms', 'LotSize']),<br \/>\ndata.as_matrix(['Price']),<br \/>\ncv=10,<br \/>\nscoring='neg_mean_squared_error')<\/code><\/p>\n<p>print &#8220;Average error: %f&#8221; % math.sqrt(- np.mean(validation_scores))<\/p>\n<p>With my data, a model trained using floor space, rooms and lot size approximated the data the best. The total error in the validation sets (10% of 50 = 5 houses) was around 45,000 euros.<\/p>\n<p>For comparison, estimates from two real real estate agents were around 15,000 euros apart, both estimated higher than my models predicted. This is probably intentional though, to get my business..<\/p>\n<p>There is different models (algorithms) that can be used to predict prices. The sklearn library offers many. Each algorithm can have a different performance on a given problem. To choose between different models we can compare their cross validation scores and pick the best performing one.<\/p>\n<p><code><br \/>\nimport pandas, sklearn<br \/>\ndata = pandas.read_csv('houses.csv')<\/code><\/p>\n<p>models = {<br \/>\n&#8216;linear_regression&#8217;: linear_model.LinearRegression(),<br \/>\n&#8216;elastic_net&#8217;: linear_model.ElasticNet(),<br \/>\n&#8216;svr&#8217;: svm.SVR(kernel=&#8217;linear&#8217;),<br \/>\n&#8216;random_forest&#8217;: ensemble.RandomForestRegressor()<br \/>\n}<\/p>\n<p>for name, model in models.items():<br \/>\nvalidation_scores = cross_val_score(model,<br \/>\ndata.as_matrix([&#8216;FloorSpace&#8217;, &#8216;Rooms&#8217;, &#8216;LotSize&#8217;]),<br \/>\ndata.as_matrix([&#8216;Price&#8217;]),<br \/>\ncv=10,<br \/>\nscoring=&#8217;neg_mean_squared_error&#8217;)<\/p>\n<p>print &#8220;Model %s had average error: %f&#8221; %<br \/>\n(name, math.sqrt(- np.mean(validation_scores))<\/p>\n<p>See below for a chart comparing the errors per algorithm:<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" width=\"630\" height=\"467\" class=\"aligncenter wp-image-14985\" src=\"https:\/\/trifork.nl\/articles\/wp-content\/uploads\/sites\/3\/2017\/02\/AAEAAQAAAAAAAA2SAAAAJGE5ODNiNzYwLWVlZGQtNDZhYi1hYzJiLWMxNjdmZmZlNTBkNA.png\" srcset=\"https:\/\/trifork.nl\/blog\/wp-content\/uploads\/sites\/3\/2017\/02\/AAEAAQAAAAAAAA2SAAAAJGE5ODNiNzYwLWVlZGQtNDZhYi1hYzJiLWMxNjdmZmZlNTBkNA.png 630w, https:\/\/trifork.nl\/blog\/wp-content\/uploads\/sites\/3\/2017\/02\/AAEAAQAAAAAAAA2SAAAAJGE5ODNiNzYwLWVlZGQtNDZhYi1hYzJiLWMxNjdmZmZlNTBkNA-300x222.png 300w\" sizes=\"auto, (max-width: 630px) 100vw, 630px\" \/><\/p>\n<p>In this case, the ElasticNet model (linear regression with regularization) had the best score, closely followed by plain linear regression.<\/p>\n<p>I hope you enjoyed this exercise or can apply some of the ideas to your own problems. If you have suggestions, feedback or questions, let me know!<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Recently I have followed an online course on machine learning to understand the current hype better. As with any subject though, only practice makes perfect, so i was looking to apply this new knowledge. While looking to sell my house I found that would be a nice opportunity: Check if the prices a real estate [&hellip;]<\/p>\n","protected":false},"author":39,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"content-type":"","footnotes":""},"categories":[112,3,97,10],"tags":[417,434,435,436],"class_list":["post-14983","post","type-post","status-publish","format-standard","hentry","category-artificial-intelligence-machine-learning","category-business","category-general","category-development","tag-elastic","tag-elasticnet","tag-predicting","tag-sklearn"],"acf":[],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v24.4 - https:\/\/yoast.com\/wordpress\/plugins\/seo\/ -->\n<title>Machine Learning: Predicting house prices - Trifork Blog<\/title>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/trifork.nl\/blog\/machine-learning-predicting-house-prices\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Machine Learning: Predicting house prices - Trifork Blog\" \/>\n<meta property=\"og:description\" content=\"Recently I have followed an online course on machine learning to understand the current hype better. As with any subject though, only practice makes perfect, so i was looking to apply this new knowledge. While looking to sell my house I found that would be a nice opportunity: Check if the prices a real estate [&hellip;]\" \/>\n<meta property=\"og:url\" content=\"https:\/\/trifork.nl\/blog\/machine-learning-predicting-house-prices\/\" \/>\n<meta property=\"og:site_name\" content=\"Trifork Blog\" \/>\n<meta property=\"article:published_time\" content=\"2017-02-16T10:06:28+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/trifork.nl\/articles\/wp-content\/uploads\/sites\/3\/2017\/02\/AAEAAQAAAAAAAAngAAAAJGMzYmZkYjEzLWY4MmUtNGIwMS05NTU3LTdhNmZiNDMzMmEyMA.png\" \/>\n<meta name=\"author\" content=\"Eike Dehling\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"Eike Dehling\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"3 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"WebPage\",\"@id\":\"https:\/\/trifork.nl\/blog\/machine-learning-predicting-house-prices\/\",\"url\":\"https:\/\/trifork.nl\/blog\/machine-learning-predicting-house-prices\/\",\"name\":\"Machine Learning: Predicting house prices - Trifork Blog\",\"isPartOf\":{\"@id\":\"https:\/\/trifork.nl\/blog\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\/\/trifork.nl\/blog\/machine-learning-predicting-house-prices\/#primaryimage\"},\"image\":{\"@id\":\"https:\/\/trifork.nl\/blog\/machine-learning-predicting-house-prices\/#primaryimage\"},\"thumbnailUrl\":\"https:\/\/trifork.nl\/articles\/wp-content\/uploads\/sites\/3\/2017\/02\/AAEAAQAAAAAAAAngAAAAJGMzYmZkYjEzLWY4MmUtNGIwMS05NTU3LTdhNmZiNDMzMmEyMA.png\",\"datePublished\":\"2017-02-16T10:06:28+00:00\",\"author\":{\"@id\":\"https:\/\/trifork.nl\/blog\/#\/schema\/person\/9ae4ae9422897fa224623ac55fbf4dd1\"},\"breadcrumb\":{\"@id\":\"https:\/\/trifork.nl\/blog\/machine-learning-predicting-house-prices\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/trifork.nl\/blog\/machine-learning-predicting-house-prices\/\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/trifork.nl\/blog\/machine-learning-predicting-house-prices\/#primaryimage\",\"url\":\"https:\/\/trifork.nl\/articles\/wp-content\/uploads\/sites\/3\/2017\/02\/AAEAAQAAAAAAAAngAAAAJGMzYmZkYjEzLWY4MmUtNGIwMS05NTU3LTdhNmZiNDMzMmEyMA.png\",\"contentUrl\":\"https:\/\/trifork.nl\/articles\/wp-content\/uploads\/sites\/3\/2017\/02\/AAEAAQAAAAAAAAngAAAAJGMzYmZkYjEzLWY4MmUtNGIwMS05NTU3LTdhNmZiNDMzMmEyMA.png\"},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/trifork.nl\/blog\/machine-learning-predicting-house-prices\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\/\/trifork.nl\/blog\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"Machine Learning: Predicting house prices\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/trifork.nl\/blog\/#website\",\"url\":\"https:\/\/trifork.nl\/blog\/\",\"name\":\"Trifork Blog\",\"description\":\"Keep updated on the technical solutions Trifork is working on!\",\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/trifork.nl\/blog\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"en-US\"},{\"@type\":\"Person\",\"@id\":\"https:\/\/trifork.nl\/blog\/#\/schema\/person\/9ae4ae9422897fa224623ac55fbf4dd1\",\"name\":\"Eike Dehling\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/trifork.nl\/blog\/#\/schema\/person\/image\/\",\"url\":\"https:\/\/secure.gravatar.com\/avatar\/ca41eff3f24ff02ae45e1d793c7f5c6c?s=96&d=mm&r=g\",\"contentUrl\":\"https:\/\/secure.gravatar.com\/avatar\/ca41eff3f24ff02ae45e1d793c7f5c6c?s=96&d=mm&r=g\",\"caption\":\"Eike Dehling\"},\"sameAs\":[\"http:\/\/trifork.com\"],\"url\":\"https:\/\/trifork.nl\/blog\/author\/eiked\/\"}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"Machine Learning: Predicting house prices - Trifork Blog","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/trifork.nl\/blog\/machine-learning-predicting-house-prices\/","og_locale":"en_US","og_type":"article","og_title":"Machine Learning: Predicting house prices - Trifork Blog","og_description":"Recently I have followed an online course on machine learning to understand the current hype better. As with any subject though, only practice makes perfect, so i was looking to apply this new knowledge. While looking to sell my house I found that would be a nice opportunity: Check if the prices a real estate [&hellip;]","og_url":"https:\/\/trifork.nl\/blog\/machine-learning-predicting-house-prices\/","og_site_name":"Trifork Blog","article_published_time":"2017-02-16T10:06:28+00:00","og_image":[{"url":"https:\/\/trifork.nl\/articles\/wp-content\/uploads\/sites\/3\/2017\/02\/AAEAAQAAAAAAAAngAAAAJGMzYmZkYjEzLWY4MmUtNGIwMS05NTU3LTdhNmZiNDMzMmEyMA.png","type":"","width":"","height":""}],"author":"Eike Dehling","twitter_card":"summary_large_image","twitter_misc":{"Written by":"Eike Dehling","Est. reading time":"3 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"WebPage","@id":"https:\/\/trifork.nl\/blog\/machine-learning-predicting-house-prices\/","url":"https:\/\/trifork.nl\/blog\/machine-learning-predicting-house-prices\/","name":"Machine Learning: Predicting house prices - Trifork Blog","isPartOf":{"@id":"https:\/\/trifork.nl\/blog\/#website"},"primaryImageOfPage":{"@id":"https:\/\/trifork.nl\/blog\/machine-learning-predicting-house-prices\/#primaryimage"},"image":{"@id":"https:\/\/trifork.nl\/blog\/machine-learning-predicting-house-prices\/#primaryimage"},"thumbnailUrl":"https:\/\/trifork.nl\/articles\/wp-content\/uploads\/sites\/3\/2017\/02\/AAEAAQAAAAAAAAngAAAAJGMzYmZkYjEzLWY4MmUtNGIwMS05NTU3LTdhNmZiNDMzMmEyMA.png","datePublished":"2017-02-16T10:06:28+00:00","author":{"@id":"https:\/\/trifork.nl\/blog\/#\/schema\/person\/9ae4ae9422897fa224623ac55fbf4dd1"},"breadcrumb":{"@id":"https:\/\/trifork.nl\/blog\/machine-learning-predicting-house-prices\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/trifork.nl\/blog\/machine-learning-predicting-house-prices\/"]}]},{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/trifork.nl\/blog\/machine-learning-predicting-house-prices\/#primaryimage","url":"https:\/\/trifork.nl\/articles\/wp-content\/uploads\/sites\/3\/2017\/02\/AAEAAQAAAAAAAAngAAAAJGMzYmZkYjEzLWY4MmUtNGIwMS05NTU3LTdhNmZiNDMzMmEyMA.png","contentUrl":"https:\/\/trifork.nl\/articles\/wp-content\/uploads\/sites\/3\/2017\/02\/AAEAAQAAAAAAAAngAAAAJGMzYmZkYjEzLWY4MmUtNGIwMS05NTU3LTdhNmZiNDMzMmEyMA.png"},{"@type":"BreadcrumbList","@id":"https:\/\/trifork.nl\/blog\/machine-learning-predicting-house-prices\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/trifork.nl\/blog\/"},{"@type":"ListItem","position":2,"name":"Machine Learning: Predicting house prices"}]},{"@type":"WebSite","@id":"https:\/\/trifork.nl\/blog\/#website","url":"https:\/\/trifork.nl\/blog\/","name":"Trifork Blog","description":"Keep updated on the technical solutions Trifork is working on!","potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/trifork.nl\/blog\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":"Person","@id":"https:\/\/trifork.nl\/blog\/#\/schema\/person\/9ae4ae9422897fa224623ac55fbf4dd1","name":"Eike Dehling","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/trifork.nl\/blog\/#\/schema\/person\/image\/","url":"https:\/\/secure.gravatar.com\/avatar\/ca41eff3f24ff02ae45e1d793c7f5c6c?s=96&d=mm&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/ca41eff3f24ff02ae45e1d793c7f5c6c?s=96&d=mm&r=g","caption":"Eike Dehling"},"sameAs":["http:\/\/trifork.com"],"url":"https:\/\/trifork.nl\/blog\/author\/eiked\/"}]}},"_links":{"self":[{"href":"https:\/\/trifork.nl\/blog\/wp-json\/wp\/v2\/posts\/14983","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/trifork.nl\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/trifork.nl\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/trifork.nl\/blog\/wp-json\/wp\/v2\/users\/39"}],"replies":[{"embeddable":true,"href":"https:\/\/trifork.nl\/blog\/wp-json\/wp\/v2\/comments?post=14983"}],"version-history":[{"count":0,"href":"https:\/\/trifork.nl\/blog\/wp-json\/wp\/v2\/posts\/14983\/revisions"}],"wp:attachment":[{"href":"https:\/\/trifork.nl\/blog\/wp-json\/wp\/v2\/media?parent=14983"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/trifork.nl\/blog\/wp-json\/wp\/v2\/categories?post=14983"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/trifork.nl\/blog\/wp-json\/wp\/v2\/tags?post=14983"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}