{"id":15154,"date":"2018-01-03T13:11:15","date_gmt":"2018-01-03T12:11:15","guid":{"rendered":"https:\/\/blog.trifork.com\/?p=15154"},"modified":"2018-01-03T13:11:15","modified_gmt":"2018-01-03T12:11:15","slug":"deep-learning-for-natural-language-processing-part-i","status":"publish","type":"post","link":"https:\/\/trifork.nl\/blog\/deep-learning-for-natural-language-processing-part-i\/","title":{"rendered":"Deep Learning for Natural Language Processing \u2013 Part I"},"content":{"rendered":"<p><em>Author &#8211; Wilder Rodrigues<\/em><\/p>\n<p>Nowadays, the task of natural language processing has been made easy with the advancements in neural networks. In the past 30 years, after the last AI Winter, amongst the many papers have been published, some have been in the area of NLP, focusing on a distributed word to vector representations.<\/p>\n<p>The papers in question are listed below (including the famous back-propagation paper that brought life to Neural Networks as we know them):<br \/>\n<!--more--><\/p>\n<ul>\n<li>Learning representations by back-propagating errors.<\/li>\n<li>David E. Rumelhart, Geoffrey E. Hinton &amp; Ronald J. Williams, 1986.<\/li>\n<li>A Neural Probabilistic Language Model<\/li>\n<li>Yoshua Bengio, R\u00e9jean Ducharme, Pascal Vincent, Christian Jauvin, 2003.<\/li>\n<li>A Unified Architecture for Natural Language Processing: Deep Neural Networks with Multitask Learning<\/li>\n<li>Ronan Collobert, Jason Weston, 2008.<\/li>\n<li>Efficient Estimation of Word Representations in Vector Space.<\/li>\n<li>Tomas Mikolov, Kai Chen, Greg Corrado, Jeffrey Dean. 2013<\/li>\n<li>GloVe: Global Vectors for Word Representation.<\/li>\n<li>Jeffrey Pennington, Richard Socher, Christopher D. Manning, 2014.<\/li>\n<\/ul>\n<p>The first paper on the list, by Hinton et al, was of extreme importance for the development of Neural Networks, it made all possible. The other papers targeted NLP, bringing improvements to the area and creating a gap between Traditional Machine Learning and Deep Learning methods.<\/p>\n<p>If we look at the Traditional Machine Learning approaches compared to what is done now with Deep Learning, we can see that most of the work done is related to modelling the problems other than engineering features.<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\" wp-image-15156 aligncenter\" src=\"https:\/\/trifork.nl\/articles\/wp-content\/uploads\/sites\/3\/2018\/01\/tml_vs_dl-300x145.png\" alt=\"\" width=\"402\" height=\"194\" srcset=\"https:\/\/trifork.nl\/blog\/wp-content\/uploads\/sites\/3\/2018\/01\/tml_vs_dl-300x145.png 300w, https:\/\/trifork.nl\/blog\/wp-content\/uploads\/sites\/3\/2018\/01\/tml_vs_dl.png 662w\" sizes=\"auto, (max-width: 402px) 100vw, 402px\" \/><\/p>\n<p>&nbsp;<\/p>\n<p>In order to understand how the Traditional Machine Learning approach has more feature engineering than its Deep learning counterpart, let\u2019s look at the table below:<\/p>\n<table style=\"height: 273px\" width=\"481\">\n<tbody>\n<tr>\n<td colspan=\"3\">\n<p style=\"text-align: center\"><b>Representations of Language<\/b><\/p>\n<\/td>\n<\/tr>\n<tr>\n<td><strong>Element<\/strong><\/td>\n<td><strong>TML<\/strong><\/td>\n<td><strong>DL<\/strong><\/td>\n<\/tr>\n<tr>\n<td>Phonology<\/td>\n<td>All phonemes<\/td>\n<td>Vector<\/td>\n<\/tr>\n<tr>\n<td>Morphology<\/td>\n<td>All morphemes<\/td>\n<td>Vector<\/td>\n<\/tr>\n<tr>\n<td>Words<\/td>\n<td>One-hot encoding<\/td>\n<td>Vector<\/td>\n<\/tr>\n<tr>\n<td>Syntax<\/td>\n<td>Phrase rules<\/td>\n<td>Vector<\/td>\n<\/tr>\n<tr>\n<td>Semantics<\/td>\n<td>Lambda calculus<\/td>\n<td>Vector<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p>&nbsp;<\/p>\n<p> The words in a corpus are distributed in a vector space, where the Euclidean distance between words will be used to measure their similarities. This approach helps to identify gender and geo location given a word within a context. The image below depicts the Vector Representations of Words:  <\/p>\n<p>&nbsp;<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\" wp-image-15157 aligncenter\" src=\"https:\/\/trifork.nl\/articles\/wp-content\/uploads\/sites\/3\/2018\/01\/n-dim-space-282x300.png\" alt=\"\" width=\"300\" height=\"319\" srcset=\"https:\/\/trifork.nl\/blog\/wp-content\/uploads\/sites\/3\/2018\/01\/n-dim-space-282x300.png 282w, https:\/\/trifork.nl\/blog\/wp-content\/uploads\/sites\/3\/2018\/01\/n-dim-space.png 690w\" sizes=\"auto, (max-width: 300px) 100vw, 300px\" \/><\/p>\n<p><em>\u00a0Source: Jon Krohn @untapt \u2013 Safari Live Lessons<\/em><\/p>\n<p>Those associations are done automatically thanks to Unsupervised Learning, a technique that dispenses the need for labeled data. All a Word to Vector approach needs is a corpus of natural language and it will learn it by clustering the words.<\/p>\n<p>When it comes to Traditional Machine Learning methods, instead of a vector representation, we have one-hot encoding. This technique works and has been used for a long time. However, it is infinitely inferior to vector representations. The image below depicts how One-hot encoding works:<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\" wp-image-15158 aligncenter\" src=\"https:\/\/trifork.nl\/articles\/wp-content\/uploads\/sites\/3\/2018\/01\/one-hot-300x226.png\" alt=\"\" width=\"371\" height=\"280\" srcset=\"https:\/\/trifork.nl\/blog\/wp-content\/uploads\/sites\/3\/2018\/01\/one-hot-300x226.png 300w, https:\/\/trifork.nl\/blog\/wp-content\/uploads\/sites\/3\/2018\/01\/one-hot-768x579.png 768w, https:\/\/trifork.nl\/blog\/wp-content\/uploads\/sites\/3\/2018\/01\/one-hot.png 900w\" sizes=\"auto, (max-width: 371px) 100vw, 371px\" \/><br \/>\n<em>Source: Jon Krohn @untapt \u2013 Safari Live Lessons<\/em><\/p>\n<p>&nbsp;<\/p>\n<p>And in the table below we summarise how both methods can be compared:<br \/>\n<img loading=\"lazy\" decoding=\"async\" class=\"wp-image-15159 aligncenter\" src=\"https:\/\/trifork.nl\/articles\/wp-content\/uploads\/sites\/3\/2018\/01\/one-hot_vs_vector-300x112.png\" alt=\"\" width=\"502\" height=\"187\" srcset=\"https:\/\/trifork.nl\/blog\/wp-content\/uploads\/sites\/3\/2018\/01\/one-hot_vs_vector-300x112.png 300w, https:\/\/trifork.nl\/blog\/wp-content\/uploads\/sites\/3\/2018\/01\/one-hot_vs_vector-1024x384.png 1024w, https:\/\/trifork.nl\/blog\/wp-content\/uploads\/sites\/3\/2018\/01\/one-hot_vs_vector-768x288.png 768w, https:\/\/trifork.nl\/blog\/wp-content\/uploads\/sites\/3\/2018\/01\/one-hot_vs_vector-1536x575.png 1536w, https:\/\/trifork.nl\/blog\/wp-content\/uploads\/sites\/3\/2018\/01\/one-hot_vs_vector-2048x767.png 2048w, https:\/\/trifork.nl\/blog\/wp-content\/uploads\/sites\/3\/2018\/01\/one-hot_vs_vector-1920x719.png 1920w\" sizes=\"auto, (max-width: 502px) 100vw, 502px\" \/><br \/>\n<em>Source: Jon Krohn @untapt \u2013 Safari Live Lessons<\/em><\/p>\n<p>&nbsp;<\/p>\n<p>Another important add-on from Vector Representations of Words is word-vector arithmetic. One can simply deduct and add words from\/to a given word to get its counterpart. For instance, let\u2019s say that we do the following:<\/p>\n<ul>\n<li>King \u2013 man + woman = Queen.<\/li>\n<\/ul>\n<p>We will demonstrate how it\u2019s done with some code further in the article.<\/p>\n<p>Although not in a very detailed way, here are some other important terms and architecture details that we have to touch. For instance, how does the algorithms get to understand words from a given context in a corpus or vice-versa?<\/p>\n<p>To start with, let\u2019s look at some terms:<\/p>\n<ul>\n<li>Target word:\n<ul>\n<li>The word to be predicted from a source of context words.<\/li>\n<\/ul>\n<\/li>\n<li>Context words:\n<ul>\n<li>The words surrounding a target word.<\/li>\n<\/ul>\n<\/li>\n<li>Padding:\n<ul>\n<li>The number of characters to the left of the first context word and to the right of the last context word.<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<p>The image below depicts how the algorithms work in with target and context words:<\/p>\n<p>&nbsp;<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"wp-image-15160 aligncenter\" src=\"https:\/\/trifork.nl\/articles\/wp-content\/uploads\/sites\/3\/2018\/01\/conext_target_words-300x93.png\" alt=\"\" width=\"432\" height=\"134\" srcset=\"https:\/\/trifork.nl\/blog\/wp-content\/uploads\/sites\/3\/2018\/01\/conext_target_words-300x93.png 300w, https:\/\/trifork.nl\/blog\/wp-content\/uploads\/sites\/3\/2018\/01\/conext_target_words-1024x319.png 1024w, https:\/\/trifork.nl\/blog\/wp-content\/uploads\/sites\/3\/2018\/01\/conext_target_words-768x239.png 768w, https:\/\/trifork.nl\/blog\/wp-content\/uploads\/sites\/3\/2018\/01\/conext_target_words-1536x478.png 1536w, https:\/\/trifork.nl\/blog\/wp-content\/uploads\/sites\/3\/2018\/01\/conext_target_words.png 1772w\" sizes=\"auto, (max-width: 432px) 100vw, 432px\" \/><\/p>\n<p><em>John Rupert Firth, 1957<\/em><\/p>\n<p><em>Source: Jon Krohn @untapt \u2013 Safari Live Lessons<\/em><\/p>\n<p>The word2vec implementation of 2013 paper by Mikolov et al comes in two flavours: Skip-Gram; and Continuous Bag-of-Words (i.e. CBOW).<\/p>\n<ul>\n<li>Skip-Gram:\n<ul>\n<li>It predicts the context words from the target words.<\/li>\n<li>Its Cost Function maximises the log probability of any possible context word given the target word.<\/li>\n<\/ul>\n<\/li>\n<li>CBOW:\n<ul>\n<li>It predicts the target word from the bag of all context words.<\/li>\n<li>It maximises the log probability of any possible target word given the context words.<\/li>\n<li>The target word is the average of the context words considered jointly.<\/li>\n<li>Why continuous? Because Word2Vec goes over all the words in the corpus continuously creating bags of words. The order is irrelevant because it looks at semantics.<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<p>As it has been shown above, the implementations do exactly the inverse of each other. Although it might be seen as an arbitrary choice, statistically speaking, it has the effect that CBOW smoothes over a lot of the distributional information (by treating an entire context as one observation), whilst its counterpart, Skip-Gram, treats each context-target pair as a new observation, and this tends to do better when we have larger datasets.<\/p>\n<p><strong>HOW TO GET IT WORKING?<\/strong><\/p>\n<p>Now that we have seen some theory about NLP and Vector Representations of Words, let\u2019s deep dive into some implementations using Keras, word2vec (the implementation of the 2013 paper), Deep (using fully connected networks) and Convolutional Networks.<\/p>\n<p>As an example, we will the MNIST dataset. It might look exhausting, since everybody does it, but having a well-curated dataset is important to start on the right foot. Our first example will just demonstrate the use of word2vec.<\/p>\n<p><strong>Import Dependencies<\/strong><\/p>\n<pre><img loading=\"lazy\" decoding=\"async\" class=\"alignnone wp-image-15164\" src=\"https:\/\/trifork.nl\/articles\/wp-content\/uploads\/sites\/3\/2018\/01\/Screen-Shot-2018-01-03-at-11.48.56-300x119.png\" alt=\"\" width=\"463\" height=\"184\" srcset=\"https:\/\/trifork.nl\/blog\/wp-content\/uploads\/sites\/3\/2018\/01\/Screen-Shot-2018-01-03-at-11.48.56-300x119.png 300w, https:\/\/trifork.nl\/blog\/wp-content\/uploads\/sites\/3\/2018\/01\/Screen-Shot-2018-01-03-at-11.48.56-1024x407.png 1024w, https:\/\/trifork.nl\/blog\/wp-content\/uploads\/sites\/3\/2018\/01\/Screen-Shot-2018-01-03-at-11.48.56-768x305.png 768w, https:\/\/trifork.nl\/blog\/wp-content\/uploads\/sites\/3\/2018\/01\/Screen-Shot-2018-01-03-at-11.48.56.png 1238w\" sizes=\"auto, (max-width: 463px) 100vw, 463px\" \/><\/pre>\n<p>What to keep in mind for further research?<\/p>\n<ol>\n<li>NLTK<\/li>\n<li>Pandas<\/li>\n<li>ScikitLearn<\/li>\n<li>Gensim<\/li>\n<\/ol>\n<p><strong>Load Model and Data<\/strong><\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone wp-image-15165\" src=\"https:\/\/trifork.nl\/articles\/wp-content\/uploads\/sites\/3\/2018\/01\/Screen-Shot-2018-01-03-at-11.50.35-300x109.png\" alt=\"\" width=\"463\" height=\"168\" srcset=\"https:\/\/trifork.nl\/blog\/wp-content\/uploads\/sites\/3\/2018\/01\/Screen-Shot-2018-01-03-at-11.50.35-300x109.png 300w, https:\/\/trifork.nl\/blog\/wp-content\/uploads\/sites\/3\/2018\/01\/Screen-Shot-2018-01-03-at-11.50.35-1024x371.png 1024w, https:\/\/trifork.nl\/blog\/wp-content\/uploads\/sites\/3\/2018\/01\/Screen-Shot-2018-01-03-at-11.50.35-768x278.png 768w, https:\/\/trifork.nl\/blog\/wp-content\/uploads\/sites\/3\/2018\/01\/Screen-Shot-2018-01-03-at-11.50.35.png 1230w\" sizes=\"auto, (max-width: 463px) 100vw, 463px\" \/><\/p>\n<p> If you want to have a look at the books available in the Gutenberg dataset, please execute the line below:<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone wp-image-15166\" src=\"https:\/\/trifork.nl\/articles\/wp-content\/uploads\/sites\/3\/2018\/01\/Screen-Shot-2018-01-03-at-11.51.20-300x48.png\" alt=\"\" width=\"468\" height=\"75\" srcset=\"https:\/\/trifork.nl\/blog\/wp-content\/uploads\/sites\/3\/2018\/01\/Screen-Shot-2018-01-03-at-11.51.20-300x48.png 300w, https:\/\/trifork.nl\/blog\/wp-content\/uploads\/sites\/3\/2018\/01\/Screen-Shot-2018-01-03-at-11.51.20-1024x163.png 1024w, https:\/\/trifork.nl\/blog\/wp-content\/uploads\/sites\/3\/2018\/01\/Screen-Shot-2018-01-03-at-11.51.20-768x122.png 768w, https:\/\/trifork.nl\/blog\/wp-content\/uploads\/sites\/3\/2018\/01\/Screen-Shot-2018-01-03-at-11.51.20.png 1260w\" sizes=\"auto, (max-width: 468px) 100vw, 468px\" \/><\/p>\n<p><strong>Load Sentences<\/strong><\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone wp-image-15167\" src=\"https:\/\/trifork.nl\/articles\/wp-content\/uploads\/sites\/3\/2018\/01\/Screen-Shot-2018-01-03-at-11.53.21-300x130.png\" alt=\"\" width=\"466\" height=\"202\" srcset=\"https:\/\/trifork.nl\/blog\/wp-content\/uploads\/sites\/3\/2018\/01\/Screen-Shot-2018-01-03-at-11.53.21-300x130.png 300w, https:\/\/trifork.nl\/blog\/wp-content\/uploads\/sites\/3\/2018\/01\/Screen-Shot-2018-01-03-at-11.53.21-1024x443.png 1024w, https:\/\/trifork.nl\/blog\/wp-content\/uploads\/sites\/3\/2018\/01\/Screen-Shot-2018-01-03-at-11.53.21-768x332.png 768w, https:\/\/trifork.nl\/blog\/wp-content\/uploads\/sites\/3\/2018\/01\/Screen-Shot-2018-01-03-at-11.53.21.png 1230w\" sizes=\"auto, (max-width: 466px) 100vw, 466px\" \/><\/p>\n<p>If you want to know how many words are in the set we loaded, please execute the line below:<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone wp-image-15168\" src=\"https:\/\/trifork.nl\/articles\/wp-content\/uploads\/sites\/3\/2018\/01\/Screen-Shot-2018-01-03-at-11.54.13-300x69.png\" alt=\"\" width=\"465\" height=\"107\" srcset=\"https:\/\/trifork.nl\/blog\/wp-content\/uploads\/sites\/3\/2018\/01\/Screen-Shot-2018-01-03-at-11.54.13-300x69.png 300w, https:\/\/trifork.nl\/blog\/wp-content\/uploads\/sites\/3\/2018\/01\/Screen-Shot-2018-01-03-at-11.54.13-1024x236.png 1024w, https:\/\/trifork.nl\/blog\/wp-content\/uploads\/sites\/3\/2018\/01\/Screen-Shot-2018-01-03-at-11.54.13-768x177.png 768w, https:\/\/trifork.nl\/blog\/wp-content\/uploads\/sites\/3\/2018\/01\/Screen-Shot-2018-01-03-at-11.54.13.png 1242w\" sizes=\"auto, (max-width: 465px) 100vw, 465px\" \/><\/p>\n<p><b>Run the Word2Vec Model<b><br \/>\n<\/b><\/b><\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone wp-image-15169\" src=\"https:\/\/trifork.nl\/articles\/wp-content\/uploads\/sites\/3\/2018\/01\/Screen-Shot-2018-01-03-at-11.54.59-300x225.png\" alt=\"\" width=\"467\" height=\"350\" srcset=\"https:\/\/trifork.nl\/blog\/wp-content\/uploads\/sites\/3\/2018\/01\/Screen-Shot-2018-01-03-at-11.54.59-300x225.png 300w, https:\/\/trifork.nl\/blog\/wp-content\/uploads\/sites\/3\/2018\/01\/Screen-Shot-2018-01-03-at-11.54.59-1024x767.png 1024w, https:\/\/trifork.nl\/blog\/wp-content\/uploads\/sites\/3\/2018\/01\/Screen-Shot-2018-01-03-at-11.54.59-768x575.png 768w, https:\/\/trifork.nl\/blog\/wp-content\/uploads\/sites\/3\/2018\/01\/Screen-Shot-2018-01-03-at-11.54.59.png 1250w\" sizes=\"auto, (max-width: 467px) 100vw, 467px\" \/><\/p>\n<p><strong>Similarities<\/strong><\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone wp-image-15170\" src=\"https:\/\/trifork.nl\/articles\/wp-content\/uploads\/sites\/3\/2018\/01\/Screen-Shot-2018-01-03-at-11.55.50-300x59.png\" alt=\"\" width=\"467\" height=\"92\" srcset=\"https:\/\/trifork.nl\/blog\/wp-content\/uploads\/sites\/3\/2018\/01\/Screen-Shot-2018-01-03-at-11.55.50-300x59.png 300w, https:\/\/trifork.nl\/blog\/wp-content\/uploads\/sites\/3\/2018\/01\/Screen-Shot-2018-01-03-at-11.55.50-1024x202.png 1024w, https:\/\/trifork.nl\/blog\/wp-content\/uploads\/sites\/3\/2018\/01\/Screen-Shot-2018-01-03-at-11.55.50-768x152.png 768w, https:\/\/trifork.nl\/blog\/wp-content\/uploads\/sites\/3\/2018\/01\/Screen-Shot-2018-01-03-at-11.55.50.png 1256w\" sizes=\"auto, (max-width: 467px) 100vw, 467px\" \/><\/p>\n<p><strong>Arithmetics<\/strong><\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone wp-image-15171\" src=\"https:\/\/trifork.nl\/articles\/wp-content\/uploads\/sites\/3\/2018\/01\/Screen-Shot-2018-01-03-at-11.56.24-300x130.png\" alt=\"\" width=\"466\" height=\"202\" srcset=\"https:\/\/trifork.nl\/blog\/wp-content\/uploads\/sites\/3\/2018\/01\/Screen-Shot-2018-01-03-at-11.56.24-300x130.png 300w, https:\/\/trifork.nl\/blog\/wp-content\/uploads\/sites\/3\/2018\/01\/Screen-Shot-2018-01-03-at-11.56.24-1024x445.png 1024w, https:\/\/trifork.nl\/blog\/wp-content\/uploads\/sites\/3\/2018\/01\/Screen-Shot-2018-01-03-at-11.56.24-768x334.png 768w, https:\/\/trifork.nl\/blog\/wp-content\/uploads\/sites\/3\/2018\/01\/Screen-Shot-2018-01-03-at-11.56.24.png 1234w\" sizes=\"auto, (max-width: 466px) 100vw, 466px\" \/><\/p>\n<p><strong>REDUCE WORD VECTOR DIMENSIONALITY WITH T-SNE<\/strong><\/p>\n<p>t-Distributed Stochastic Neighbour Embedding (t-SNE) is a technique for dimensionality reduction that is particularly well suited for the visualisation of high-dimensional datasets.<\/p>\n<p>Although our vector space representation doesn\u2019t have many dimensions, we got only 64, it is still enough to get humans confused if we try to plot the 8667 words from our vocabulary in a graph. Now imagine how it would work with 10 million words and thousand dimensions! Our friend that was shortly explained above can help us with that. Let\u2019s get to the code and plotting.<\/p>\n<p>&nbsp;<\/p>\n<p><strong>Applying t-SNE<\/strong><\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone wp-image-15172\" src=\"https:\/\/trifork.nl\/articles\/wp-content\/uploads\/sites\/3\/2018\/01\/Screen-Shot-2018-01-03-at-11.57.42-300x148.png\" alt=\"\" width=\"450\" height=\"222\" srcset=\"https:\/\/trifork.nl\/blog\/wp-content\/uploads\/sites\/3\/2018\/01\/Screen-Shot-2018-01-03-at-11.57.42-300x148.png 300w, https:\/\/trifork.nl\/blog\/wp-content\/uploads\/sites\/3\/2018\/01\/Screen-Shot-2018-01-03-at-11.57.42-1024x504.png 1024w, https:\/\/trifork.nl\/blog\/wp-content\/uploads\/sites\/3\/2018\/01\/Screen-Shot-2018-01-03-at-11.57.42-768x378.png 768w, https:\/\/trifork.nl\/blog\/wp-content\/uploads\/sites\/3\/2018\/01\/Screen-Shot-2018-01-03-at-11.57.42.png 1252w\" sizes=\"auto, (max-width: 450px) 100vw, 450px\" \/><\/p>\n<p> <img loading=\"lazy\" decoding=\"async\" class=\" wp-image-15161 aligncenter\" src=\"https:\/\/trifork.nl\/articles\/wp-content\/uploads\/sites\/3\/2018\/01\/graph_1-300x283.png\" alt=\"\" width=\"386\" height=\"365\" srcset=\"https:\/\/trifork.nl\/blog\/wp-content\/uploads\/sites\/3\/2018\/01\/graph_1-300x283.png 300w, https:\/\/trifork.nl\/blog\/wp-content\/uploads\/sites\/3\/2018\/01\/graph_1-1024x966.png 1024w, https:\/\/trifork.nl\/blog\/wp-content\/uploads\/sites\/3\/2018\/01\/graph_1-768x725.png 768w, https:\/\/trifork.nl\/blog\/wp-content\/uploads\/sites\/3\/2018\/01\/graph_1.png 1060w\" sizes=\"auto, (max-width: 386px) 100vw, 386px\" \/><\/p>\n<p>Doesn\u2019t help to see it like this. Let\u2019s try something else instead:<\/p>\n<p>&nbsp;<\/p>\n<p><b>BokehJS<\/b><\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone wp-image-15173\" src=\"https:\/\/trifork.nl\/articles\/wp-content\/uploads\/sites\/3\/2018\/01\/Screen-Shot-2018-01-03-at-11.58.48-300x105.png\" alt=\"\" width=\"486\" height=\"170\" srcset=\"https:\/\/trifork.nl\/blog\/wp-content\/uploads\/sites\/3\/2018\/01\/Screen-Shot-2018-01-03-at-11.58.48-300x105.png 300w, https:\/\/trifork.nl\/blog\/wp-content\/uploads\/sites\/3\/2018\/01\/Screen-Shot-2018-01-03-at-11.58.48-1024x357.png 1024w, https:\/\/trifork.nl\/blog\/wp-content\/uploads\/sites\/3\/2018\/01\/Screen-Shot-2018-01-03-at-11.58.48-768x268.png 768w, https:\/\/trifork.nl\/blog\/wp-content\/uploads\/sites\/3\/2018\/01\/Screen-Shot-2018-01-03-at-11.58.48.png 1256w\" sizes=\"auto, (max-width: 486px) 100vw, 486px\" \/><\/p>\n<p><b><b><br \/>\n <img loading=\"lazy\" decoding=\"async\" class=\"wp-image-15162 aligncenter\" src=\"https:\/\/trifork.nl\/articles\/wp-content\/uploads\/sites\/3\/2018\/01\/graph_2-298x300.png\" alt=\"\" width=\"366\" height=\"368\" srcset=\"https:\/\/trifork.nl\/blog\/wp-content\/uploads\/sites\/3\/2018\/01\/graph_2-298x300.png 298w, https:\/\/trifork.nl\/blog\/wp-content\/uploads\/sites\/3\/2018\/01\/graph_2-150x150.png 150w, https:\/\/trifork.nl\/blog\/wp-content\/uploads\/sites\/3\/2018\/01\/graph_2-768x773.png 768w, https:\/\/trifork.nl\/blog\/wp-content\/uploads\/sites\/3\/2018\/01\/graph_2.png 1000w\" sizes=\"auto, (max-width: 366px) 100vw, 366px\" \/> <\/b><\/b><\/p>\n<p> You can use the Bokeh controls to zoom in and move around the graph.<\/p>\n<p>&nbsp;<\/p>\n<p><strong>Acknowledgements<\/strong><\/p>\n<p>Thanks for taking the time to read this article and do not hesitate to give your feedback.<\/p>\n<p>The source code is available via Github: <a href=\"https:\/\/github.com\/ekholabs\/DLinK\">https:\/\/github.com\/ekholabs\/DLinK<\/a><\/p>\n<p>Interested in applying Machine Learning at your company? See how experts at Trifork can help you. More info <a href=\"http:\/\/trifork.com\/machine-learning\/\">here<\/a>.<\/p>\n<p>Author &#8211; Wilder Rodrigues<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Author &#8211; Wilder Rodrigues Nowadays, the task of natural language processing has been made easy with the advancements in neural networks. In the past 30 years, after the last AI Winter, amongst the many papers have been published, some have been in the area of NLP, focusing on a distributed word to vector representations. The [&hellip;]<\/p>\n","protected":false},"author":84,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"content-type":"","footnotes":""},"categories":[112,113],"tags":[446,447,448,449,450,451,452,453,454],"class_list":["post-15154","post","type-post","status-publish","format-standard","hentry","category-artificial-intelligence-machine-learning","category-axon","tag-bokehjs","tag-convolutional-networks","tag-deep-learning","tag-deep-networks","tag-keras","tag-natural-language-processing","tag-t-sne","tag-vector-representations-of-words","tag-word2vec"],"acf":[],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v24.4 - https:\/\/yoast.com\/wordpress\/plugins\/seo\/ -->\n<title>Deep Learning for Natural Language Processing \u2013 Part I - Trifork Blog<\/title>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/trifork.nl\/blog\/deep-learning-for-natural-language-processing-part-i\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Deep Learning for Natural Language Processing \u2013 Part I - Trifork Blog\" \/>\n<meta property=\"og:description\" content=\"Author &#8211; Wilder Rodrigues Nowadays, the task of natural language processing has been made easy with the advancements in neural networks. In the past 30 years, after the last AI Winter, amongst the many papers have been published, some have been in the area of NLP, focusing on a distributed word to vector representations. The [&hellip;]\" \/>\n<meta property=\"og:url\" content=\"https:\/\/trifork.nl\/blog\/deep-learning-for-natural-language-processing-part-i\/\" \/>\n<meta property=\"og:site_name\" content=\"Trifork Blog\" \/>\n<meta property=\"article:published_time\" content=\"2018-01-03T12:11:15+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/trifork.nl\/articles\/wp-content\/uploads\/sites\/3\/2018\/01\/tml_vs_dl-300x145.png\" \/>\n<meta name=\"author\" content=\"Monika Kauliute\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"Monika Kauliute\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"6 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"WebPage\",\"@id\":\"https:\/\/trifork.nl\/blog\/deep-learning-for-natural-language-processing-part-i\/\",\"url\":\"https:\/\/trifork.nl\/blog\/deep-learning-for-natural-language-processing-part-i\/\",\"name\":\"Deep Learning for Natural Language Processing \u2013 Part I - Trifork Blog\",\"isPartOf\":{\"@id\":\"https:\/\/trifork.nl\/blog\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\/\/trifork.nl\/blog\/deep-learning-for-natural-language-processing-part-i\/#primaryimage\"},\"image\":{\"@id\":\"https:\/\/trifork.nl\/blog\/deep-learning-for-natural-language-processing-part-i\/#primaryimage\"},\"thumbnailUrl\":\"https:\/\/trifork.nl\/articles\/wp-content\/uploads\/sites\/3\/2018\/01\/tml_vs_dl-300x145.png\",\"datePublished\":\"2018-01-03T12:11:15+00:00\",\"author\":{\"@id\":\"https:\/\/trifork.nl\/blog\/#\/schema\/person\/17980baec3b95a025b2bba1e49c57c60\"},\"breadcrumb\":{\"@id\":\"https:\/\/trifork.nl\/blog\/deep-learning-for-natural-language-processing-part-i\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/trifork.nl\/blog\/deep-learning-for-natural-language-processing-part-i\/\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/trifork.nl\/blog\/deep-learning-for-natural-language-processing-part-i\/#primaryimage\",\"url\":\"https:\/\/trifork.nl\/articles\/wp-content\/uploads\/sites\/3\/2018\/01\/tml_vs_dl-300x145.png\",\"contentUrl\":\"https:\/\/trifork.nl\/articles\/wp-content\/uploads\/sites\/3\/2018\/01\/tml_vs_dl-300x145.png\"},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/trifork.nl\/blog\/deep-learning-for-natural-language-processing-part-i\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\/\/trifork.nl\/blog\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"Deep Learning for Natural Language Processing \u2013 Part I\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/trifork.nl\/blog\/#website\",\"url\":\"https:\/\/trifork.nl\/blog\/\",\"name\":\"Trifork Blog\",\"description\":\"Keep updated on the technical solutions Trifork is working on!\",\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/trifork.nl\/blog\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"en-US\"},{\"@type\":\"Person\",\"@id\":\"https:\/\/trifork.nl\/blog\/#\/schema\/person\/17980baec3b95a025b2bba1e49c57c60\",\"name\":\"Monika Kauliute\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/trifork.nl\/blog\/#\/schema\/person\/image\/\",\"url\":\"https:\/\/secure.gravatar.com\/avatar\/ce4a38609336315c7ac02e93999aa25b?s=96&d=mm&r=g\",\"contentUrl\":\"https:\/\/secure.gravatar.com\/avatar\/ce4a38609336315c7ac02e93999aa25b?s=96&d=mm&r=g\",\"caption\":\"Monika Kauliute\"},\"url\":\"https:\/\/trifork.nl\/blog\/author\/monika\/\"}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"Deep Learning for Natural Language Processing \u2013 Part I - Trifork Blog","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/trifork.nl\/blog\/deep-learning-for-natural-language-processing-part-i\/","og_locale":"en_US","og_type":"article","og_title":"Deep Learning for Natural Language Processing \u2013 Part I - Trifork Blog","og_description":"Author &#8211; Wilder Rodrigues Nowadays, the task of natural language processing has been made easy with the advancements in neural networks. In the past 30 years, after the last AI Winter, amongst the many papers have been published, some have been in the area of NLP, focusing on a distributed word to vector representations. The [&hellip;]","og_url":"https:\/\/trifork.nl\/blog\/deep-learning-for-natural-language-processing-part-i\/","og_site_name":"Trifork Blog","article_published_time":"2018-01-03T12:11:15+00:00","og_image":[{"url":"https:\/\/trifork.nl\/articles\/wp-content\/uploads\/sites\/3\/2018\/01\/tml_vs_dl-300x145.png","type":"","width":"","height":""}],"author":"Monika Kauliute","twitter_card":"summary_large_image","twitter_misc":{"Written by":"Monika Kauliute","Est. reading time":"6 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"WebPage","@id":"https:\/\/trifork.nl\/blog\/deep-learning-for-natural-language-processing-part-i\/","url":"https:\/\/trifork.nl\/blog\/deep-learning-for-natural-language-processing-part-i\/","name":"Deep Learning for Natural Language Processing \u2013 Part I - Trifork Blog","isPartOf":{"@id":"https:\/\/trifork.nl\/blog\/#website"},"primaryImageOfPage":{"@id":"https:\/\/trifork.nl\/blog\/deep-learning-for-natural-language-processing-part-i\/#primaryimage"},"image":{"@id":"https:\/\/trifork.nl\/blog\/deep-learning-for-natural-language-processing-part-i\/#primaryimage"},"thumbnailUrl":"https:\/\/trifork.nl\/articles\/wp-content\/uploads\/sites\/3\/2018\/01\/tml_vs_dl-300x145.png","datePublished":"2018-01-03T12:11:15+00:00","author":{"@id":"https:\/\/trifork.nl\/blog\/#\/schema\/person\/17980baec3b95a025b2bba1e49c57c60"},"breadcrumb":{"@id":"https:\/\/trifork.nl\/blog\/deep-learning-for-natural-language-processing-part-i\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/trifork.nl\/blog\/deep-learning-for-natural-language-processing-part-i\/"]}]},{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/trifork.nl\/blog\/deep-learning-for-natural-language-processing-part-i\/#primaryimage","url":"https:\/\/trifork.nl\/articles\/wp-content\/uploads\/sites\/3\/2018\/01\/tml_vs_dl-300x145.png","contentUrl":"https:\/\/trifork.nl\/articles\/wp-content\/uploads\/sites\/3\/2018\/01\/tml_vs_dl-300x145.png"},{"@type":"BreadcrumbList","@id":"https:\/\/trifork.nl\/blog\/deep-learning-for-natural-language-processing-part-i\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/trifork.nl\/blog\/"},{"@type":"ListItem","position":2,"name":"Deep Learning for Natural Language Processing \u2013 Part I"}]},{"@type":"WebSite","@id":"https:\/\/trifork.nl\/blog\/#website","url":"https:\/\/trifork.nl\/blog\/","name":"Trifork Blog","description":"Keep updated on the technical solutions Trifork is working on!","potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/trifork.nl\/blog\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":"Person","@id":"https:\/\/trifork.nl\/blog\/#\/schema\/person\/17980baec3b95a025b2bba1e49c57c60","name":"Monika Kauliute","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/trifork.nl\/blog\/#\/schema\/person\/image\/","url":"https:\/\/secure.gravatar.com\/avatar\/ce4a38609336315c7ac02e93999aa25b?s=96&d=mm&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/ce4a38609336315c7ac02e93999aa25b?s=96&d=mm&r=g","caption":"Monika Kauliute"},"url":"https:\/\/trifork.nl\/blog\/author\/monika\/"}]}},"_links":{"self":[{"href":"https:\/\/trifork.nl\/blog\/wp-json\/wp\/v2\/posts\/15154","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/trifork.nl\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/trifork.nl\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/trifork.nl\/blog\/wp-json\/wp\/v2\/users\/84"}],"replies":[{"embeddable":true,"href":"https:\/\/trifork.nl\/blog\/wp-json\/wp\/v2\/comments?post=15154"}],"version-history":[{"count":0,"href":"https:\/\/trifork.nl\/blog\/wp-json\/wp\/v2\/posts\/15154\/revisions"}],"wp:attachment":[{"href":"https:\/\/trifork.nl\/blog\/wp-json\/wp\/v2\/media?parent=15154"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/trifork.nl\/blog\/wp-json\/wp\/v2\/categories?post=15154"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/trifork.nl\/blog\/wp-json\/wp\/v2\/tags?post=15154"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}