{"id":15185,"date":"2018-01-15T17:36:14","date_gmt":"2018-01-15T16:36:14","guid":{"rendered":"https:\/\/blog.trifork.com\/?p=15185"},"modified":"2018-01-15T17:36:14","modified_gmt":"2018-01-15T16:36:14","slug":"deep-learning-for-natural-language-processing-part-ii","status":"publish","type":"post","link":"https:\/\/trifork.nl\/blog\/deep-learning-for-natural-language-processing-part-ii\/","title":{"rendered":"Deep Learning for Natural Language Processing \u2013 Part II"},"content":{"rendered":"<p><em>Author \u2013 Wilder Rodrigues<\/em><\/p>\n<p>Wilder continues his series about NLP.&nbsp; This time he would like to bring you to the Deep Learning realm, exploring Deep Neural Networks for sentiment analysis.<\/p>\n<p>If you are already familiar with those types of network and know why certain choices are made, you can skip the first section and go straight to the next one.<\/p>\n<p><em>I promise the decisions I made in terms of <\/em>train \/ validation \/ test<em> split won\u2019t disappoint you. As a matter of fact, training the same models with different sets got me a better result than those achieved by Dr. Jon Krohn, from untapt, in his Live Lessons.<\/em><\/p>\n<p><em>From what I have seen in the last 2 years, I think we all have already been through a lot of explanations about shallow, intermediate and deep neural networks. So, to save us some time, I will avoid revisiting them here. We will dive straight into all the arsenal we will be using throughout this story. However, we won\u2019t just follow a list of things, but instead, we will understand why those things are being used.<\/em><br \/>\n<!--more--><strong><br \/>\nWhat do we have at our disposal?<\/strong><\/p>\n<p>There is a famous image that does not appear only on several blog posts about Machine Learning, but also on <a href=\"https:\/\/medium.com\/@meetup\">Meetup<\/a> meetings, conferences and company-closed presentations. Although I wouldn\u2019t add it here just for the sake of it, I think it\u2019s interesting to get the audience to see some ofthe network architectures we currently have.<\/p>\n<p>The people behind the @asimovinstitute came up with a quite interesting compilation. See for yourself:<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone wp-image-15186\" src=\"https:\/\/trifork.nl\/articles\/wp-content\/uploads\/sites\/3\/2018\/01\/pic1-200x300.png\" alt=\"\" width=\"326\" height=\"489\" srcset=\"https:\/\/trifork.nl\/blog\/wp-content\/uploads\/sites\/3\/2018\/01\/pic1-200x300.png 200w, https:\/\/trifork.nl\/blog\/wp-content\/uploads\/sites\/3\/2018\/01\/pic1-683x1024.png 683w, https:\/\/trifork.nl\/blog\/wp-content\/uploads\/sites\/3\/2018\/01\/pic1-768x1152.png 768w, https:\/\/trifork.nl\/blog\/wp-content\/uploads\/sites\/3\/2018\/01\/pic1-1024x1536.png 1024w, https:\/\/trifork.nl\/blog\/wp-content\/uploads\/sites\/3\/2018\/01\/pic1-1365x2048.png 1365w, https:\/\/trifork.nl\/blog\/wp-content\/uploads\/sites\/3\/2018\/01\/pic1-853x1280.png 853w, https:\/\/trifork.nl\/blog\/wp-content\/uploads\/sites\/3\/2018\/01\/pic1.png 1600w\" sizes=\"auto, (max-width: 326px) 100vw, 326px\" \/><\/p>\n<p>The Neural Network Zoo, by the Asimov Institute \u2014 <a class=\"markup--anchor markup--figure-anchor\" href=\"https:\/\/www.asimovinstitute.org\/\" target=\"_blank\" rel=\"nofollow noopener\" data-href=\"https:\/\/www.asimovinstitute.org\/\">https:\/\/www.asimovinstitute.org\/<\/a><\/p>\n<p>&nbsp;<\/p>\n<p id=\"53bc\" class=\"graf graf--p graf-after--figure\">There are many more curiosities and things to learn about the Neural Network Zoo. If that\u2019s something that would make you more interested in Neural Networks and their intrinsic details, you can find it all in their own blog post here:&nbsp;<a class=\"markup--anchor markup--p-anchor\" href=\"https:\/\/www.asimovinstitute.org\/neural-network-zoo\/\" target=\"_blank\" rel=\"nofollow noopener\" data-href=\"https:\/\/www.asimovinstitute.org\/neural-network-zoo\/\">https:\/\/www.asimovinstitute.org\/neural-network-zoo\/<\/a>.<\/p>\n<p id=\"1e76\" class=\"graf graf--p graf-after--p\">For this story, we are going to focus on a Feedforward Neural Network. If you paid attention to the image above, it won&#8217;t be difficult to spot it. Otherwise, please refer to the image below:<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"size-full wp-image-15187 aligncenter\" src=\"https:\/\/trifork.nl\/articles\/wp-content\/uploads\/sites\/3\/2018\/01\/pic2.png\" alt=\"\" width=\"206\" height=\"152\"><\/p>\n<p id=\"9156\" class=\"graf graf--p graf-after--figure\">That is a pretty simple network! But no worries, we will be stretching it a bit, making it deeper. However, before we do that, let\u2019s try to get a couple of things clear:<\/p>\n<ul class=\"postList\">\n<li id=\"ae6e\" class=\"graf graf--li graf-after--p\">Do you know about the Primary Visual Cortex? Since I can not hear your answer, I will just assume you don\u2019t. So, there goes a fun fact: the Primary Visual Cortex is also known as V1. It\u2019s called V1 because that\u2019s like a first layer; it recognises the shapes of what we see. As the complexity of the information flowing increases, the layers dealing with it get more specialised and they have higher numbers associated with them. The picture below gives an idea about what I mean:<\/li>\n<\/ul>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone wp-image-15189\" src=\"https:\/\/trifork.nl\/articles\/wp-content\/uploads\/sites\/3\/2018\/01\/image3-300x120.png\" alt=\"\" width=\"380\" height=\"152\" srcset=\"https:\/\/trifork.nl\/blog\/wp-content\/uploads\/sites\/3\/2018\/01\/image3-300x120.png 300w, https:\/\/trifork.nl\/blog\/wp-content\/uploads\/sites\/3\/2018\/01\/image3-768x308.png 768w, https:\/\/trifork.nl\/blog\/wp-content\/uploads\/sites\/3\/2018\/01\/image3.png 838w\" sizes=\"auto, (max-width: 380px) 100vw, 380px\" \/><\/p>\n<ul class=\"postList\">\n<li id=\"e09f\" class=\"graf graf--li graf-after--figure graf--trailing\">You might have seen network architectures starting from bottom to top, or top to bottom, or sideways; even if those drawings were modeling the same type of network. This behavior doesn\u2019t seem to be very consistent, and in addition to that, most of the papers or articles I have read depict neural networks starting either from bottom to top or left to right. If I have to draw a network architecture here, I will stick with the left to right style.<\/li>\n<\/ul>\n<p id=\"c394\" class=\"graf graf--p graf--leading\">Now moving on to some more technical aspects of our Feedforward Neural Network, it\u2019s time to actually define it. To get to work with word embedding vectors, as we did in the first part of this series, we have to define a very specific layer in our architecture. Along with this mysterious layer, we will also use a couple of other layers that are needed to achieve our goal.<\/p>\n<p id=\"f4cc\" class=\"graf graf--p graf-after--p\">In the list below, I pinpoint the types of layers we will be using for now:<\/p>\n<ul class=\"postList\">\n<li id=\"f8d2\" class=\"graf graf--li graf-after--p\">Embedding Layer: this layer is used to create a vector representation of the words in our document.<\/li>\n<li id=\"7c0d\" class=\"graf graf--li graf-after--li\">Flatten Layer: after we have our vector, we flat it out in order to apply dimensionality reduction and get a 1D vector.<\/li>\n<li id=\"ecbc\" class=\"graf graf--li graf-after--li\">Dense, or Fully Connected, Layer: a layer which has its neurons connected to all the other neurons in the previous layer.<\/li>\n<\/ul>\n<p id=\"2398\" class=\"graf graf--p graf-after--li\">Okay, we got the layers, but now what? There are other things to consider, like regularisation; cost function; activation function; and train\/validation\/test split. How are we going to deal with those details? Before we get to the code, let\u2019s follow a few baby-steps through the sections below.<\/p>\n<h4 id=\"d5ba\" class=\"graf graf--h4 graf-after--p\">Regularisation<\/h4>\n<p id=\"17e6\" class=\"graf graf--p graf-after--h4\">The goal of regularisation is to reduce overfitting. Although I really feel excited with the idea to talk about Weight Decay, Dropout and drawing formulas here, I won\u2019t get into those details because if you want to learn about Deep Learning for NLP, I expect you to know about regularisation.<\/p>\n<p id=\"921d\" class=\"graf graf--p graf-after--p\">For the sake of it, we will be using Dropout. If you don\u2019t know that much about it, please refer to Professor Geoffrey Hinton\u2019s paper, published in 2013:&nbsp;<a class=\"markup--anchor markup--p-anchor\" href=\"https:\/\/www.cs.toronto.edu\/~hinton\/absps\/JMLRdropout.pdf\" target=\"_blank\" rel=\"nofollow noopener\" data-href=\"https:\/\/www.cs.toronto.edu\/~hinton\/absps\/JMLRdropout.pdf\">https:\/\/www.cs.toronto.edu\/~hinton\/absps\/JMLRdropout.pdf<\/a>.<\/p>\n<h4 id=\"5bd3\" class=\"graf graf--h4 graf-after--p\">Cost Function<\/h4>\n<p id=\"2398\" class=\"graf graf--p graf-after--li\">Okay, we got some details about the layers and the regularisation technique we will be using, now let\u2019s try to get some intuition about the Cost Function we chose, why we did it and what else we had at our reach:<\/p>\n<ul class=\"postList\">\n<li id=\"1685\" class=\"graf graf--li graf-after--p\"><strong class=\"markup--strong markup--li-strong\">Mean Squared Error:<\/strong>&nbsp;it is defined as&nbsp;<em class=\"markup--em markup--li-em\">1\/m * sum(Yhat, Y) ** 2<\/em>, where the&nbsp;<em class=\"markup--em markup--li-em\">Yhat<\/em>&nbsp;is defined by the hypotheses function (e.g. \ud835\udf030 + \ud835\udf031*x, etc.). The MSE is mostly used with Linear Regression because its hypotheses is linear and hence it generates a bowl, or convex, shaped function. By being convex-shaped, it helps to achieve global optimum during Gradient Descent. But wait, we will be working on a Classification problem and our hypotheses function will be non-linear (will talk about it later). It means MSE is not my choice for this problem.<\/li>\n<li id=\"de9f\" class=\"graf graf--li graf-after--li\"><strong class=\"markup--strong markup--li-strong\">Cross Entropy:&nbsp;<\/strong>using Cross Entropy with Classification problem is the way to go. It differs from the MSE due to its nature and the shape its function has. Its equation is given by&nbsp;<em class=\"markup--em markup--li-em\">-1\/m * sum(Y * log(Yhat) + (1 &#8211; Y) * log(1 &#8211; Yhat))<\/em>. Just like with MSE, here we want the loss to be as small as possible. Let\u2019s assume that&nbsp;<em class=\"markup--em markup--li-em\">Y<\/em>&nbsp;equals to 1. By doing so, the second term of the equation,&nbsp;<em class=\"markup--em markup--li-em\">(1 &#8211; Y) * log(1 &#8211; Yhat)<\/em>, equals to 0, therefore canceled. In the remaining term, we have&nbsp;<em class=\"markup--em markup--li-em\">Y * log(Yhat)<\/em>, here we want&nbsp;<em class=\"markup--em markup--li-em\">Yhat<\/em>&nbsp;to be as big as possible. Since it is calculated by the Sigmoid function, it cannot be bigger than 1. Now if we look at the other side of the spectrum, and have&nbsp;<em class=\"markup--em markup--li-em\">Y<\/em>&nbsp;equals 0, then the first term is canceled and in that case we want&nbsp;<em class=\"markup--em markup--li-em\">Yhat<\/em>&nbsp;to be small. I think you get the rest.<\/li>\n<\/ul>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone wp-image-15190\" src=\"https:\/\/trifork.nl\/articles\/wp-content\/uploads\/sites\/3\/2018\/01\/image4-300x134.png\" alt=\"\" width=\"446\" height=\"199\" srcset=\"https:\/\/trifork.nl\/blog\/wp-content\/uploads\/sites\/3\/2018\/01\/image4-300x134.png 300w, https:\/\/trifork.nl\/blog\/wp-content\/uploads\/sites\/3\/2018\/01\/image4-768x342.png 768w, https:\/\/trifork.nl\/blog\/wp-content\/uploads\/sites\/3\/2018\/01\/image4.png 802w\" sizes=\"auto, (max-width: 446px) 100vw, 446px\" \/><\/p>\n<p>Non-Convex vs. Convex Functions. Pretty hard to find a global optimum in the&nbsp;former.<\/p>\n<h4 id=\"6818\" class=\"graf graf--h4 graf-after--figure\">Gradient Descent Optimisation<\/h4>\n<p id=\"e975\" class=\"graf graf--p graf-after--h4\">We have already been through a lot without writing one single line of code. Please, bare with me here.<\/p>\n<p id=\"700b\" class=\"graf graf--p graf-after--p\">The goal of the Gradient Descent is to find a global optimum, that\u2019s why the non-convex function is bad for such a task. The way it works, along with the minimised loss, is that for every training iteration it takes a small step towards the global optimum. Those steps are calculated based on the weights minus the learning rate times the derivative of the weights. The latter gives the changes in the weights per iteration. To put it in a more mathematical way, that\u2019s how we update the weights during the gradient descent steps: w = w &#8211; \u237a * dJ(w,b)\/dw. That\u2019s the most simple version of it, but there are some others that makes learning incredible faster compared to this one. One last thing: you might have noticed the&nbsp;<em class=\"markup--em markup--p-em\">\u237a<\/em>, or learning rate. That\u2019s a hyper-parameter that we use to tune the learning. When it\u2019s too big, the steps might make the gradient descent hard to converge, and just keep moving from one side of this bowl to the other; if the number is too small, then it will take longer to learn.<\/p>\n<p id=\"fc98\" class=\"graf graf--p graf-after--p\">For our exercise, we will be using&nbsp;<em class=\"markup--em markup--p-em\">Adam<\/em>. This optimiser was first published in 2015, by Diederik P. Kingma and Jimmy Li Ba, from the University of Amsterdam and the University of Toronto, respectively. Adam combines momentum and RMSProp, which are both optimisation algorithms. Due to this combinations, Adam has been recognised as the best option when it comes to Deep Learning problems. In the \u201cCS231n: Convolutional Neural Networks for Visual Recognition\u201d developed by&nbsp;<a class=\"markup--user markup--p-user\" href=\"https:\/\/medium.com\/@karpathy\" target=\"_blank\" rel=\"noopener\" data-href=\"https:\/\/medium.com\/@karpathy\" data-anchor-type=\"2\" data-user-id=\"ac9d9a35533e\" data-action-value=\"ac9d9a35533e\" data-action=\"show-user-card\" data-action-type=\"hover\">Andrej Karpathy<\/a>, et al., at Stanford University, it is also suggested as the default optimisation method.<\/p>\n<p id=\"5e08\" class=\"graf graf--p graf-after--p\">The implementation of Adam, which uses exponentially weighted average, to achieve momentum, and root mean square (RMS, from the RMSProp), is quite interesting and lengthy. I will share an image of a handwritten version of it, just to give you an idea about how the weights are updated using Adam.<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone wp-image-15191\" src=\"https:\/\/trifork.nl\/articles\/wp-content\/uploads\/sites\/3\/2018\/01\/Screen-Shot-2018-01-15-at-15.54.53-205x300.png\" alt=\"\" width=\"300\" height=\"439\" srcset=\"https:\/\/trifork.nl\/blog\/wp-content\/uploads\/sites\/3\/2018\/01\/Screen-Shot-2018-01-15-at-15.54.53-205x300.png 205w, https:\/\/trifork.nl\/blog\/wp-content\/uploads\/sites\/3\/2018\/01\/Screen-Shot-2018-01-15-at-15.54.53.png 636w\" sizes=\"auto, (max-width: 300px) 100vw, 300px\" \/><\/p>\n<p>Adam: Adaptive Moment Estimation.<\/p>\n<h4 id=\"1930\" class=\"graf graf--h4 graf--leading\">Activation Function<\/h4>\n<p id=\"f0f0\" class=\"graf graf--p graf-after--h4\">I almost forgot about that! We are going to use 2 different activation functions to work on this sentiment analysis problem. In the hidden layer, we will use the Rectified Linear Unit (ReLU) and as the output activation, we will use the Sigmoid function.<\/p>\n<p id=\"5a18\" class=\"graf graf--p graf-after--p\">Perhaps it enough to show the graphs of both functions here. If you need some more details, I will add some references at the bottom of the paper so you can go straight to the source.<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone wp-image-15192\" src=\"https:\/\/trifork.nl\/articles\/wp-content\/uploads\/sites\/3\/2018\/01\/image6-300x119.png\" alt=\"\" width=\"439\" height=\"174\" srcset=\"https:\/\/trifork.nl\/blog\/wp-content\/uploads\/sites\/3\/2018\/01\/image6-300x119.png 300w, https:\/\/trifork.nl\/blog\/wp-content\/uploads\/sites\/3\/2018\/01\/image6-1024x407.png 1024w, https:\/\/trifork.nl\/blog\/wp-content\/uploads\/sites\/3\/2018\/01\/image6-768x305.png 768w, https:\/\/trifork.nl\/blog\/wp-content\/uploads\/sites\/3\/2018\/01\/image6.png 1434w\" sizes=\"auto, (max-width: 439px) 100vw, 439px\" \/><\/p>\n<p>Sigmoid and ReLU activation functions.<\/p>\n<p id=\"c366\" class=\"graf graf--p graf-after--figure\">Just a few details on the decisions made, we are using sigmoid for our hypotheses because we have a binary classification, or univariate, problem. The output should be 1, if the review is positive, or 0 otherwise. If we would be dealing with a multivariate problem, or trying to estimate the likelihood of a given image be one of those represented in out N-classes problem, for example with the MNIST or CIFAR datasets, we would use a Softmax as it outputs a probability distribution over the output vector (our N classes).<\/p>\n<h4 id=\"c1b4\" class=\"graf graf--h4 graf-after--p\">Getting our Hands&nbsp;Dirty<\/h4>\n<p id=\"2d60\" class=\"graf graf--p graf-after--h4\">That was a lot of information! But after deciding for our network architecture, regularisation, cost function and optimisation, now it\u2019s time to actually get something done and see for ourselves what this can do. I will basically add some blocks of code, along with comments to explain why certain things are done the way they are, or at least the way I chose to do.<\/p>\n<h4 id=\"ef2f\" class=\"graf graf--h4 graf-after--p\">Import Dependencies<\/h4>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone wp-image-15193\" src=\"https:\/\/trifork.nl\/articles\/wp-content\/uploads\/sites\/3\/2018\/01\/Screen-Shot-2018-01-15-at-15.58.25-300x137.png\" alt=\"\" width=\"438\" height=\"200\" srcset=\"https:\/\/trifork.nl\/blog\/wp-content\/uploads\/sites\/3\/2018\/01\/Screen-Shot-2018-01-15-at-15.58.25-300x137.png 300w, https:\/\/trifork.nl\/blog\/wp-content\/uploads\/sites\/3\/2018\/01\/Screen-Shot-2018-01-15-at-15.58.25.png 671w\" sizes=\"auto, (max-width: 438px) 100vw, 438px\" \/><\/p>\n<h4 id=\"6cab\" class=\"graf graf--h4 graf-after--pre\">Set Hyper-Parameters<\/h4>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone wp-image-15194\" src=\"https:\/\/trifork.nl\/articles\/wp-content\/uploads\/sites\/3\/2018\/01\/Screen-Shot-2018-01-15-at-15.59.28-300x141.png\" alt=\"\" width=\"436\" height=\"205\" srcset=\"https:\/\/trifork.nl\/blog\/wp-content\/uploads\/sites\/3\/2018\/01\/Screen-Shot-2018-01-15-at-15.59.28-300x141.png 300w, https:\/\/trifork.nl\/blog\/wp-content\/uploads\/sites\/3\/2018\/01\/Screen-Shot-2018-01-15-at-15.59.28.png 677w\" sizes=\"auto, (max-width: 436px) 100vw, 436px\" \/><\/p>\n<h4 id=\"1eaa\" class=\"graf graf--h4 graf-after--pre\">Load Data<\/h4>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone wp-image-15195\" src=\"https:\/\/trifork.nl\/articles\/wp-content\/uploads\/sites\/3\/2018\/01\/Screen-Shot-2018-01-15-at-16.00.22-300x33.png\" alt=\"\" width=\"436\" height=\"48\" srcset=\"https:\/\/trifork.nl\/blog\/wp-content\/uploads\/sites\/3\/2018\/01\/Screen-Shot-2018-01-15-at-16.00.22-300x33.png 300w, https:\/\/trifork.nl\/blog\/wp-content\/uploads\/sites\/3\/2018\/01\/Screen-Shot-2018-01-15-at-16.00.22.png 684w\" sizes=\"auto, (max-width: 436px) 100vw, 436px\" \/><\/p>\n<p id=\"0ef4\" class=\"graf graf--p graf-after--pre\">Okay, now we defined some hyper-parameters and also loaded the data we will be training with. However, it would be interesting to understand why those things were done before we move on. Let\u2019s take some time to explore those decisions now:<\/p>\n<ol class=\"postList\">\n<li id=\"0796\" class=\"graf graf--li graf-after--p\">Our Embedding space will have 64 dimensions;<\/li>\n<li id=\"1a31\" class=\"graf graf--li graf-after--li\">We are keeping a bag of 5000 unique words only.<\/li>\n<li id=\"2944\" class=\"graf graf--li graf-after--li\">Top most frequent words to ignore;<\/li>\n<li id=\"6399\" class=\"graf graf--li graf-after--li\">We are only looking at reviews that are 100 words max long.<\/li>\n<\/ol>\n<p id=\"a587\" class=\"graf graf--p graf-after--li\">After running the commands above, try to loop through some samples to see the length of the reviews before applying the preprocess step:<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone wp-image-15196\" src=\"https:\/\/trifork.nl\/articles\/wp-content\/uploads\/sites\/3\/2018\/01\/Screen-Shot-2018-01-15-at-16.01.09-300x32.png\" alt=\"\" width=\"441\" height=\"47\" srcset=\"https:\/\/trifork.nl\/blog\/wp-content\/uploads\/sites\/3\/2018\/01\/Screen-Shot-2018-01-15-at-16.01.09-300x32.png 300w, https:\/\/trifork.nl\/blog\/wp-content\/uploads\/sites\/3\/2018\/01\/Screen-Shot-2018-01-15-at-16.01.09.png 682w\" sizes=\"auto, (max-width: 441px) 100vw, 441px\" \/><\/p>\n<p>It should print something like the block below:<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone wp-image-15197\" src=\"https:\/\/trifork.nl\/articles\/wp-content\/uploads\/sites\/3\/2018\/01\/Screen-Shot-2018-01-15-at-16.01.50-300x70.png\" alt=\"\" width=\"441\" height=\"103\" srcset=\"https:\/\/trifork.nl\/blog\/wp-content\/uploads\/sites\/3\/2018\/01\/Screen-Shot-2018-01-15-at-16.01.50-300x70.png 300w, https:\/\/trifork.nl\/blog\/wp-content\/uploads\/sites\/3\/2018\/01\/Screen-Shot-2018-01-15-at-16.01.50.png 683w\" sizes=\"auto, (max-width: 441px) 100vw, 441px\" \/><\/p>\n<p id=\"ac25\" class=\"graf graf--p graf-after--pre\">You will see the difference when we pre-process the data.<\/p>\n<h4 id=\"cddd\" class=\"graf graf--h4 graf-after--p\">Restore Words from&nbsp;Index<\/h4>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone wp-image-15198\" src=\"https:\/\/trifork.nl\/articles\/wp-content\/uploads\/sites\/3\/2018\/01\/Screen-Shot-2018-01-15-at-16.02.33-300x110.png\" alt=\"\" width=\"436\" height=\"160\" srcset=\"https:\/\/trifork.nl\/blog\/wp-content\/uploads\/sites\/3\/2018\/01\/Screen-Shot-2018-01-15-at-16.02.33-300x110.png 300w, https:\/\/trifork.nl\/blog\/wp-content\/uploads\/sites\/3\/2018\/01\/Screen-Shot-2018-01-15-at-16.02.33.png 682w\" sizes=\"auto, (max-width: 436px) 100vw, 436px\" \/><\/p>\n<p>But why should we care about creating those indexes? Words with the indexes 0, 1 and 2 will be represented as PAD, START and UNK (i.e. unknown). Remember that we are not loading the full reviews, but only 5000 unique words per review and limiting a review length to 100. It will all make more sense once we pre-process the data. For now, let\u2019s have a look at what we have as content for the first review:<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone wp-image-15199\" src=\"https:\/\/trifork.nl\/articles\/wp-content\/uploads\/sites\/3\/2018\/01\/Screen-Shot-2018-01-15-at-16.03.14-300x34.png\" alt=\"\" width=\"441\" height=\"50\" srcset=\"https:\/\/trifork.nl\/blog\/wp-content\/uploads\/sites\/3\/2018\/01\/Screen-Shot-2018-01-15-at-16.03.14-300x34.png 300w, https:\/\/trifork.nl\/blog\/wp-content\/uploads\/sites\/3\/2018\/01\/Screen-Shot-2018-01-15-at-16.03.14.png 687w\" sizes=\"auto, (max-width: 441px) 100vw, 441px\" \/><\/p>\n<p>The outcome should be something like the block below:<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone wp-image-15200\" src=\"https:\/\/trifork.nl\/articles\/wp-content\/uploads\/sites\/3\/2018\/01\/Screen-Shot-2018-01-15-at-16.03.57-300x177.png\" alt=\"\" width=\"432\" height=\"255\" srcset=\"https:\/\/trifork.nl\/blog\/wp-content\/uploads\/sites\/3\/2018\/01\/Screen-Shot-2018-01-15-at-16.03.57-300x177.png 300w, https:\/\/trifork.nl\/blog\/wp-content\/uploads\/sites\/3\/2018\/01\/Screen-Shot-2018-01-15-at-16.03.57.png 680w\" sizes=\"auto, (max-width: 432px) 100vw, 432px\" \/><\/p>\n<p>And if you want to compare it with the full review, try the following:<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone wp-image-15201\" src=\"https:\/\/trifork.nl\/articles\/wp-content\/uploads\/sites\/3\/2018\/01\/Screen-Shot-2018-01-15-at-16.04.41-300x42.png\" alt=\"\" width=\"436\" height=\"61\" srcset=\"https:\/\/trifork.nl\/blog\/wp-content\/uploads\/sites\/3\/2018\/01\/Screen-Shot-2018-01-15-at-16.04.41-300x42.png 300w, https:\/\/trifork.nl\/blog\/wp-content\/uploads\/sites\/3\/2018\/01\/Screen-Shot-2018-01-15-at-16.04.41.png 685w\" sizes=\"auto, (max-width: 436px) 100vw, 436px\" \/><\/p>\n<p>And the output should be:<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone wp-image-15202\" src=\"https:\/\/trifork.nl\/articles\/wp-content\/uploads\/sites\/3\/2018\/01\/Screen-Shot-2018-01-15-at-16.05.30-300x188.png\" alt=\"\" width=\"436\" height=\"273\" srcset=\"https:\/\/trifork.nl\/blog\/wp-content\/uploads\/sites\/3\/2018\/01\/Screen-Shot-2018-01-15-at-16.05.30-300x188.png 300w, https:\/\/trifork.nl\/blog\/wp-content\/uploads\/sites\/3\/2018\/01\/Screen-Shot-2018-01-15-at-16.05.30.png 678w\" sizes=\"auto, (max-width: 436px) 100vw, 436px\" \/><\/p>\n<h4 id=\"65d0\" class=\"graf graf--h4 graf-after--pre\">Preprocess Data<\/h4>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone wp-image-15203\" src=\"https:\/\/trifork.nl\/articles\/wp-content\/uploads\/sites\/3\/2018\/01\/Screen-Shot-2018-01-15-at-16.06.05-300x61.png\" alt=\"\" width=\"438\" height=\"89\" srcset=\"https:\/\/trifork.nl\/blog\/wp-content\/uploads\/sites\/3\/2018\/01\/Screen-Shot-2018-01-15-at-16.06.05-300x61.png 300w, https:\/\/trifork.nl\/blog\/wp-content\/uploads\/sites\/3\/2018\/01\/Screen-Shot-2018-01-15-at-16.06.05.png 685w\" sizes=\"auto, (max-width: 438px) 100vw, 438px\" \/><\/p>\n<p id=\"3f4e\" class=\"graf graf--p graf-after--pre\">Now that we have pre-processed the data, let\u2019s have a look at what happened to the reviews. Just as a remark, during pre-processing we have informed the max review length and also padding and truncating options.<\/p>\n<p id=\"f5c8\" class=\"graf graf--p graf-after--p\">Let\u2019s start with the length of our reviews:<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone wp-image-15204\" src=\"https:\/\/trifork.nl\/articles\/wp-content\/uploads\/sites\/3\/2018\/01\/Screen-Shot-2018-01-15-at-16.06.44-300x33.png\" alt=\"\" width=\"445\" height=\"49\" srcset=\"https:\/\/trifork.nl\/blog\/wp-content\/uploads\/sites\/3\/2018\/01\/Screen-Shot-2018-01-15-at-16.06.44-300x33.png 300w, https:\/\/trifork.nl\/blog\/wp-content\/uploads\/sites\/3\/2018\/01\/Screen-Shot-2018-01-15-at-16.06.44.png 685w\" sizes=\"auto, (max-width: 445px) 100vw, 445px\" \/><\/p>\n<p id=\"a059\" class=\"graf graf--p graf-after--pre\">Now they are all 100 words long.<\/p>\n<p id=\"9597\" class=\"graf graf--p graf-after--p\">So, the pre-processing step already did something for us. Let\u2019s have a look at what else changed by indexing the words of the first review:<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone wp-image-15205\" src=\"https:\/\/trifork.nl\/articles\/wp-content\/uploads\/sites\/3\/2018\/01\/Screen-Shot-2018-01-15-at-16.07.21-300x32.png\" alt=\"\" width=\"441\" height=\"47\" srcset=\"https:\/\/trifork.nl\/blog\/wp-content\/uploads\/sites\/3\/2018\/01\/Screen-Shot-2018-01-15-at-16.07.21-300x32.png 300w, https:\/\/trifork.nl\/blog\/wp-content\/uploads\/sites\/3\/2018\/01\/Screen-Shot-2018-01-15-at-16.07.21.png 687w\" sizes=\"auto, (max-width: 441px) 100vw, 441px\" \/><\/p>\n<p>And the output should be something like the block below:<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone wp-image-15206\" src=\"https:\/\/trifork.nl\/articles\/wp-content\/uploads\/sites\/3\/2018\/01\/Screen-Shot-2018-01-15-at-16.08.02-300x88.png\" alt=\"\" width=\"436\" height=\"128\" srcset=\"https:\/\/trifork.nl\/blog\/wp-content\/uploads\/sites\/3\/2018\/01\/Screen-Shot-2018-01-15-at-16.08.02-300x88.png 300w, https:\/\/trifork.nl\/blog\/wp-content\/uploads\/sites\/3\/2018\/01\/Screen-Shot-2018-01-15-at-16.08.02.png 686w\" sizes=\"auto, (max-width: 436px) 100vw, 436px\" \/><\/p>\n<p>And what about the 6th review? Remember it had only 43 words? Let\u2019s have a look:<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone wp-image-15207\" src=\"https:\/\/trifork.nl\/articles\/wp-content\/uploads\/sites\/3\/2018\/01\/Screen-Shot-2018-01-15-at-16.08.39-300x32.png\" alt=\"\" width=\"441\" height=\"47\" srcset=\"https:\/\/trifork.nl\/blog\/wp-content\/uploads\/sites\/3\/2018\/01\/Screen-Shot-2018-01-15-at-16.08.39-300x32.png 300w, https:\/\/trifork.nl\/blog\/wp-content\/uploads\/sites\/3\/2018\/01\/Screen-Shot-2018-01-15-at-16.08.39.png 692w\" sizes=\"auto, (max-width: 441px) 100vw, 441px\" \/><\/p>\n<p id=\"159e\" class=\"graf graf--p graf-after--pre\">And the output should be something like the block below:<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone wp-image-15208\" src=\"https:\/\/trifork.nl\/articles\/wp-content\/uploads\/sites\/3\/2018\/01\/Screen-Shot-2018-01-15-at-16.09.19-300x78.png\" alt=\"\" width=\"438\" height=\"114\" srcset=\"https:\/\/trifork.nl\/blog\/wp-content\/uploads\/sites\/3\/2018\/01\/Screen-Shot-2018-01-15-at-16.09.19-300x78.png 300w, https:\/\/trifork.nl\/blog\/wp-content\/uploads\/sites\/3\/2018\/01\/Screen-Shot-2018-01-15-at-16.09.19.png 688w\" sizes=\"auto, (max-width: 438px) 100vw, 438px\" \/><\/p>\n<p id=\"2726\" class=\"graf graf--p graf-after--pre\">It\u2019s pretty clear what happened, right?<\/p>\n<h4 id=\"a816\" class=\"graf graf--h4 graf-after--p\">Design the Network Architecture<\/h4>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone wp-image-15209\" src=\"https:\/\/trifork.nl\/articles\/wp-content\/uploads\/sites\/3\/2018\/01\/Screen-Shot-2018-01-15-at-16.09.57-300x99.png\" alt=\"\" width=\"439\" height=\"145\" srcset=\"https:\/\/trifork.nl\/blog\/wp-content\/uploads\/sites\/3\/2018\/01\/Screen-Shot-2018-01-15-at-16.09.57-300x99.png 300w, https:\/\/trifork.nl\/blog\/wp-content\/uploads\/sites\/3\/2018\/01\/Screen-Shot-2018-01-15-at-16.09.57.png 686w\" sizes=\"auto, (max-width: 439px) 100vw, 439px\" \/><\/p>\n<h4 id=\"3dca\" class=\"graf graf--h4 graf-after--pre\">Create a Model Checkpoint<\/h4>\n<p id=\"3489\" class=\"graf graf--p graf-after--h4\">For every epoch we run, we want to save its weights. By doing so, we can later look at the epoch with the best accuracy on the validation set and then use it perform predictions on top of unseen data.<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone wp-image-15210\" src=\"https:\/\/trifork.nl\/articles\/wp-content\/uploads\/sites\/3\/2018\/01\/Screen-Shot-2018-01-15-at-16.10.34-300x62.png\" alt=\"\" width=\"435\" height=\"90\" srcset=\"https:\/\/trifork.nl\/blog\/wp-content\/uploads\/sites\/3\/2018\/01\/Screen-Shot-2018-01-15-at-16.10.34-300x62.png 300w, https:\/\/trifork.nl\/blog\/wp-content\/uploads\/sites\/3\/2018\/01\/Screen-Shot-2018-01-15-at-16.10.34.png 686w\" sizes=\"auto, (max-width: 435px) 100vw, 435px\" \/><\/p>\n<h4 id=\"9d82\" class=\"graf graf--h3 graf-after--pre\">Compile and Run the&nbsp;Model<\/h4>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone wp-image-15211\" src=\"https:\/\/trifork.nl\/articles\/wp-content\/uploads\/sites\/3\/2018\/01\/Screen-Shot-2018-01-15-at-16.11.13-300x61.png\" alt=\"\" width=\"438\" height=\"89\" srcset=\"https:\/\/trifork.nl\/blog\/wp-content\/uploads\/sites\/3\/2018\/01\/Screen-Shot-2018-01-15-at-16.11.13-300x61.png 300w, https:\/\/trifork.nl\/blog\/wp-content\/uploads\/sites\/3\/2018\/01\/Screen-Shot-2018-01-15-at-16.11.13.png 684w\" sizes=\"auto, (max-width: 438px) 100vw, 438px\" \/><\/p>\n<p id=\"57e0\" class=\"graf graf--p graf-after--pre\">As you can see in the&nbsp;<em class=\"markup--em markup--p-em\">fit<\/em>&nbsp;call above, I\u2019m not using the the validation data provided in the IMBd dataset, but instead I\u2019m just randomly splitting the train data, taking 20% of it to use for validation. I decided to do this, so after the model is trained I can run tests on top of the real validation data (as a test set) since it has not been seen by the model.<\/p>\n<h4 id=\"d2ee\" class=\"graf graf--h4 graf-after--p\">Run Prediction<\/h4>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone wp-image-15212\" src=\"https:\/\/trifork.nl\/articles\/wp-content\/uploads\/sites\/3\/2018\/01\/Screen-Shot-2018-01-15-at-16.22.31-300x23.png\" alt=\"\" width=\"443\" height=\"34\" srcset=\"https:\/\/trifork.nl\/blog\/wp-content\/uploads\/sites\/3\/2018\/01\/Screen-Shot-2018-01-15-at-16.22.31-300x23.png 300w, https:\/\/trifork.nl\/blog\/wp-content\/uploads\/sites\/3\/2018\/01\/Screen-Shot-2018-01-15-at-16.22.31.png 688w\" sizes=\"auto, (max-width: 443px) 100vw, 443px\" \/><\/p>\n<h4 id=\"1c20\" class=\"graf graf--h4 graf-after--pre\">Calculate the Average Under the Receiver Operating Characteristic (ROC) Curve&nbsp;(AUC)<\/h4>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone wp-image-15213\" src=\"https:\/\/trifork.nl\/articles\/wp-content\/uploads\/sites\/3\/2018\/01\/Screen-Shot-2018-01-15-at-16.42.54-300x31.png\" alt=\"\" width=\"435\" height=\"45\" srcset=\"https:\/\/trifork.nl\/blog\/wp-content\/uploads\/sites\/3\/2018\/01\/Screen-Shot-2018-01-15-at-16.42.54-300x31.png 300w, https:\/\/trifork.nl\/blog\/wp-content\/uploads\/sites\/3\/2018\/01\/Screen-Shot-2018-01-15-at-16.42.54.png 687w\" sizes=\"auto, (max-width: 435px) 100vw, 435px\" \/><\/p>\n<p id=\"6088\" class=\"graf graf--p graf-after--pre\"><strong class=\"markup--strong markup--p-strong\">Just a bit of information about the AUC:<\/strong>&nbsp;it gives you a better measurement method as it takes into account the True Positive Rate (TPR) and the False Positive Rate (FPR). To get those rates calculated, it does the following:<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone wp-image-15214\" src=\"https:\/\/trifork.nl\/articles\/wp-content\/uploads\/sites\/3\/2018\/01\/Screen-Shot-2018-01-15-at-16.43.46-300x32.png\" alt=\"\" width=\"431\" height=\"46\" srcset=\"https:\/\/trifork.nl\/blog\/wp-content\/uploads\/sites\/3\/2018\/01\/Screen-Shot-2018-01-15-at-16.43.46-300x32.png 300w, https:\/\/trifork.nl\/blog\/wp-content\/uploads\/sites\/3\/2018\/01\/Screen-Shot-2018-01-15-at-16.43.46.png 686w\" sizes=\"auto, (max-width: 431px) 100vw, 431px\" \/><\/p>\n<p>But for that to work, we also need to calculate the True Positive\/Negative and False Positive\/Negative values. To get that done, the AUC uses thresholds based on the predictions that were made, then calculating the TP\/TN\/FP\/FN per threshold. The amount of TPF FPR points will be given by the amount of thresholds that were used. What would that be? All the unique predictions in the output. To understand it better, let\u2019s have a look at the table below:<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone wp-image-15215\" src=\"https:\/\/trifork.nl\/articles\/wp-content\/uploads\/sites\/3\/2018\/01\/Screen-Shot-2018-01-15-at-16.44.42-221x300.png\" alt=\"\" width=\"323\" height=\"438\" srcset=\"https:\/\/trifork.nl\/blog\/wp-content\/uploads\/sites\/3\/2018\/01\/Screen-Shot-2018-01-15-at-16.44.42-221x300.png 221w, https:\/\/trifork.nl\/blog\/wp-content\/uploads\/sites\/3\/2018\/01\/Screen-Shot-2018-01-15-at-16.44.42.png 646w\" sizes=\"auto, (max-width: 323px) 100vw, 323px\" \/><\/p>\n<p id=\"3f4f\" class=\"graf graf--p graf-after--figure\">Use the TPR and FPR points that were calculated based&nbsp;on the 3 thresholds (i.e. T@.3, T@.5 and T@.7) plot a graph. The accuracy is then measured by the AUC.<\/p>\n<p id=\"4bf5\" class=\"graf graf--p graf-after--p\">After all that has been explained, feel free to execute the code below and plot your ROC graph:<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone wp-image-15216\" src=\"https:\/\/trifork.nl\/articles\/wp-content\/uploads\/sites\/3\/2018\/01\/Screen-Shot-2018-01-15-at-16.45.27-300x180.png\" alt=\"\" width=\"434\" height=\"261\" srcset=\"https:\/\/trifork.nl\/blog\/wp-content\/uploads\/sites\/3\/2018\/01\/Screen-Shot-2018-01-15-at-16.45.27-300x180.png 300w, https:\/\/trifork.nl\/blog\/wp-content\/uploads\/sites\/3\/2018\/01\/Screen-Shot-2018-01-15-at-16.45.27.png 679w\" sizes=\"auto, (max-width: 434px) 100vw, 434px\" \/><\/p>\n<p>You should see something like the image below:<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone wp-image-15217\" src=\"https:\/\/trifork.nl\/articles\/wp-content\/uploads\/sites\/3\/2018\/01\/Screen-Shot-2018-01-15-at-16.46.19-300x217.png\" alt=\"\" width=\"470\" height=\"340\" srcset=\"https:\/\/trifork.nl\/blog\/wp-content\/uploads\/sites\/3\/2018\/01\/Screen-Shot-2018-01-15-at-16.46.19-300x217.png 300w, https:\/\/trifork.nl\/blog\/wp-content\/uploads\/sites\/3\/2018\/01\/Screen-Shot-2018-01-15-at-16.46.19.png 727w\" sizes=\"auto, (max-width: 470px) 100vw, 470px\" \/><\/p>\n<p>If you want to know if it worked okay, just play around with the output (i.e. y_hat). It contains all the predictions. The y_valid has the expected outputs. So, check both of them to see how far the predictions are. The code below can help you out a bit:<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone wp-image-15218\" src=\"https:\/\/trifork.nl\/articles\/wp-content\/uploads\/sites\/3\/2018\/01\/Screen-Shot-2018-01-15-at-16.47.02-300x71.png\" alt=\"\" width=\"439\" height=\"104\" srcset=\"https:\/\/trifork.nl\/blog\/wp-content\/uploads\/sites\/3\/2018\/01\/Screen-Shot-2018-01-15-at-16.47.02-300x71.png 300w, https:\/\/trifork.nl\/blog\/wp-content\/uploads\/sites\/3\/2018\/01\/Screen-Shot-2018-01-15-at-16.47.02.png 688w\" sizes=\"auto, (max-width: 439px) 100vw, 439px\" \/><\/p>\n<p>The output should be something similar to the table below:<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone wp-image-15219\" src=\"https:\/\/trifork.nl\/articles\/wp-content\/uploads\/sites\/3\/2018\/01\/Screen-Shot-2018-01-15-at-16.47.41-207x300.png\" alt=\"\" width=\"305\" height=\"443\" srcset=\"https:\/\/trifork.nl\/blog\/wp-content\/uploads\/sites\/3\/2018\/01\/Screen-Shot-2018-01-15-at-16.47.41-207x300.png 207w, https:\/\/trifork.nl\/blog\/wp-content\/uploads\/sites\/3\/2018\/01\/Screen-Shot-2018-01-15-at-16.47.41.png 444w\" sizes=\"auto, (max-width: 305px) 100vw, 305px\" \/><\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone wp-image-15220\" src=\"https:\/\/trifork.nl\/articles\/wp-content\/uploads\/sites\/3\/2018\/01\/Screen-Shot-2018-01-15-at-16.48.15-300x51.png\" alt=\"\" width=\"435\" height=\"74\" srcset=\"https:\/\/trifork.nl\/blog\/wp-content\/uploads\/sites\/3\/2018\/01\/Screen-Shot-2018-01-15-at-16.48.15-300x51.png 300w, https:\/\/trifork.nl\/blog\/wp-content\/uploads\/sites\/3\/2018\/01\/Screen-Shot-2018-01-15-at-16.48.15.png 688w\" sizes=\"auto, (max-width: 435px) 100vw, 435px\" \/><\/p>\n<p>This review looks pretty positive:<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone wp-image-15221\" src=\"https:\/\/trifork.nl\/articles\/wp-content\/uploads\/sites\/3\/2018\/01\/Screen-Shot-2018-01-15-at-16.49.02-300x70.png\" alt=\"\" width=\"433\" height=\"101\" srcset=\"https:\/\/trifork.nl\/blog\/wp-content\/uploads\/sites\/3\/2018\/01\/Screen-Shot-2018-01-15-at-16.49.02-300x70.png 300w, https:\/\/trifork.nl\/blog\/wp-content\/uploads\/sites\/3\/2018\/01\/Screen-Shot-2018-01-15-at-16.49.02.png 683w\" sizes=\"auto, (max-width: 433px) 100vw, 433px\" \/><\/p>\n<h4 id=\"fbf1\" class=\"graf graf--h4 graf-after--pre\">Acknowledgements<\/h4>\n<p id=\"ca66\" class=\"graf graf--p graf-after--h4\">Thanks again for following this far deep into the theory and code. I really appreciate those who take the time to read it.<\/p>\n<p id=\"9ecc\" class=\"graf graf--p graf-after--p graf--trailing\">The source code can be found here:&nbsp;<a class=\"markup--anchor markup--p-anchor\" href=\"https:\/\/github.com\/ekholabs\/DLinK\" target=\"_blank\" rel=\"nofollow noopener\" data-href=\"https:\/\/github.com\/ekholabs\/DLinK\">https:\/\/github.com\/ekholabs\/DLinK<\/a><\/p>\n<p>Interested in applying Machine Learning at your company? See how experts at Trifork can help you. More info&nbsp;<a href=\"http:\/\/trifork.com\/machine-learning\/\">here<\/a>.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Author \u2013 Wilder Rodrigues Wilder continues his series about NLP.&nbsp; This time he would like to bring you to the Deep Learning realm, exploring Deep Neural Networks for sentiment analysis. If you are already familiar with those types of network and know why certain choices are made, you can skip the first section and go [&hellip;]<\/p>\n","protected":false},"author":84,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"content-type":"","footnotes":""},"categories":[112,113],"tags":[455,448,116,451,456,457],"class_list":["post-15185","post","type-post","status-publish","format-standard","hentry","category-artificial-intelligence-machine-learning","category-axon","tag-artificial-intelligence","tag-deep-learning","tag-machine-learning","tag-natural-language-processing","tag-neural-networks","tag-nlp"],"acf":[],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v24.4 - https:\/\/yoast.com\/wordpress\/plugins\/seo\/ -->\n<title>Deep Learning for Natural Language Processing \u2013 Part II - Trifork Blog<\/title>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/trifork.nl\/blog\/deep-learning-for-natural-language-processing-part-ii\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Deep Learning for Natural Language Processing \u2013 Part II - Trifork Blog\" \/>\n<meta property=\"og:description\" content=\"Author \u2013 Wilder Rodrigues Wilder continues his series about NLP.&nbsp; This time he would like to bring you to the Deep Learning realm, exploring Deep Neural Networks for sentiment analysis. If you are already familiar with those types of network and know why certain choices are made, you can skip the first section and go [&hellip;]\" \/>\n<meta property=\"og:url\" content=\"https:\/\/trifork.nl\/blog\/deep-learning-for-natural-language-processing-part-ii\/\" \/>\n<meta property=\"og:site_name\" content=\"Trifork Blog\" \/>\n<meta property=\"article:published_time\" content=\"2018-01-15T16:36:14+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/trifork.nl\/articles\/wp-content\/uploads\/sites\/3\/2018\/01\/pic1-200x300.png\" \/>\n<meta name=\"author\" content=\"Monika Kauliute\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"Monika Kauliute\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"13 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"WebPage\",\"@id\":\"https:\/\/trifork.nl\/blog\/deep-learning-for-natural-language-processing-part-ii\/\",\"url\":\"https:\/\/trifork.nl\/blog\/deep-learning-for-natural-language-processing-part-ii\/\",\"name\":\"Deep Learning for Natural Language Processing \u2013 Part II - Trifork Blog\",\"isPartOf\":{\"@id\":\"https:\/\/trifork.nl\/blog\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\/\/trifork.nl\/blog\/deep-learning-for-natural-language-processing-part-ii\/#primaryimage\"},\"image\":{\"@id\":\"https:\/\/trifork.nl\/blog\/deep-learning-for-natural-language-processing-part-ii\/#primaryimage\"},\"thumbnailUrl\":\"https:\/\/trifork.nl\/articles\/wp-content\/uploads\/sites\/3\/2018\/01\/pic1-200x300.png\",\"datePublished\":\"2018-01-15T16:36:14+00:00\",\"author\":{\"@id\":\"https:\/\/trifork.nl\/blog\/#\/schema\/person\/17980baec3b95a025b2bba1e49c57c60\"},\"breadcrumb\":{\"@id\":\"https:\/\/trifork.nl\/blog\/deep-learning-for-natural-language-processing-part-ii\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/trifork.nl\/blog\/deep-learning-for-natural-language-processing-part-ii\/\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/trifork.nl\/blog\/deep-learning-for-natural-language-processing-part-ii\/#primaryimage\",\"url\":\"https:\/\/trifork.nl\/articles\/wp-content\/uploads\/sites\/3\/2018\/01\/pic1-200x300.png\",\"contentUrl\":\"https:\/\/trifork.nl\/articles\/wp-content\/uploads\/sites\/3\/2018\/01\/pic1-200x300.png\"},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/trifork.nl\/blog\/deep-learning-for-natural-language-processing-part-ii\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\/\/trifork.nl\/blog\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"Deep Learning for Natural Language Processing \u2013 Part II\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/trifork.nl\/blog\/#website\",\"url\":\"https:\/\/trifork.nl\/blog\/\",\"name\":\"Trifork Blog\",\"description\":\"Keep updated on the technical solutions Trifork is working on!\",\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/trifork.nl\/blog\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"en-US\"},{\"@type\":\"Person\",\"@id\":\"https:\/\/trifork.nl\/blog\/#\/schema\/person\/17980baec3b95a025b2bba1e49c57c60\",\"name\":\"Monika Kauliute\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/trifork.nl\/blog\/#\/schema\/person\/image\/\",\"url\":\"https:\/\/secure.gravatar.com\/avatar\/ce4a38609336315c7ac02e93999aa25b?s=96&d=mm&r=g\",\"contentUrl\":\"https:\/\/secure.gravatar.com\/avatar\/ce4a38609336315c7ac02e93999aa25b?s=96&d=mm&r=g\",\"caption\":\"Monika Kauliute\"},\"url\":\"https:\/\/trifork.nl\/blog\/author\/monika\/\"}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"Deep Learning for Natural Language Processing \u2013 Part II - Trifork Blog","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/trifork.nl\/blog\/deep-learning-for-natural-language-processing-part-ii\/","og_locale":"en_US","og_type":"article","og_title":"Deep Learning for Natural Language Processing \u2013 Part II - Trifork Blog","og_description":"Author \u2013 Wilder Rodrigues Wilder continues his series about NLP.&nbsp; This time he would like to bring you to the Deep Learning realm, exploring Deep Neural Networks for sentiment analysis. If you are already familiar with those types of network and know why certain choices are made, you can skip the first section and go [&hellip;]","og_url":"https:\/\/trifork.nl\/blog\/deep-learning-for-natural-language-processing-part-ii\/","og_site_name":"Trifork Blog","article_published_time":"2018-01-15T16:36:14+00:00","og_image":[{"url":"https:\/\/trifork.nl\/articles\/wp-content\/uploads\/sites\/3\/2018\/01\/pic1-200x300.png","type":"","width":"","height":""}],"author":"Monika Kauliute","twitter_card":"summary_large_image","twitter_misc":{"Written by":"Monika Kauliute","Est. reading time":"13 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"WebPage","@id":"https:\/\/trifork.nl\/blog\/deep-learning-for-natural-language-processing-part-ii\/","url":"https:\/\/trifork.nl\/blog\/deep-learning-for-natural-language-processing-part-ii\/","name":"Deep Learning for Natural Language Processing \u2013 Part II - Trifork Blog","isPartOf":{"@id":"https:\/\/trifork.nl\/blog\/#website"},"primaryImageOfPage":{"@id":"https:\/\/trifork.nl\/blog\/deep-learning-for-natural-language-processing-part-ii\/#primaryimage"},"image":{"@id":"https:\/\/trifork.nl\/blog\/deep-learning-for-natural-language-processing-part-ii\/#primaryimage"},"thumbnailUrl":"https:\/\/trifork.nl\/articles\/wp-content\/uploads\/sites\/3\/2018\/01\/pic1-200x300.png","datePublished":"2018-01-15T16:36:14+00:00","author":{"@id":"https:\/\/trifork.nl\/blog\/#\/schema\/person\/17980baec3b95a025b2bba1e49c57c60"},"breadcrumb":{"@id":"https:\/\/trifork.nl\/blog\/deep-learning-for-natural-language-processing-part-ii\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/trifork.nl\/blog\/deep-learning-for-natural-language-processing-part-ii\/"]}]},{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/trifork.nl\/blog\/deep-learning-for-natural-language-processing-part-ii\/#primaryimage","url":"https:\/\/trifork.nl\/articles\/wp-content\/uploads\/sites\/3\/2018\/01\/pic1-200x300.png","contentUrl":"https:\/\/trifork.nl\/articles\/wp-content\/uploads\/sites\/3\/2018\/01\/pic1-200x300.png"},{"@type":"BreadcrumbList","@id":"https:\/\/trifork.nl\/blog\/deep-learning-for-natural-language-processing-part-ii\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/trifork.nl\/blog\/"},{"@type":"ListItem","position":2,"name":"Deep Learning for Natural Language Processing \u2013 Part II"}]},{"@type":"WebSite","@id":"https:\/\/trifork.nl\/blog\/#website","url":"https:\/\/trifork.nl\/blog\/","name":"Trifork Blog","description":"Keep updated on the technical solutions Trifork is working on!","potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/trifork.nl\/blog\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":"Person","@id":"https:\/\/trifork.nl\/blog\/#\/schema\/person\/17980baec3b95a025b2bba1e49c57c60","name":"Monika Kauliute","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/trifork.nl\/blog\/#\/schema\/person\/image\/","url":"https:\/\/secure.gravatar.com\/avatar\/ce4a38609336315c7ac02e93999aa25b?s=96&d=mm&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/ce4a38609336315c7ac02e93999aa25b?s=96&d=mm&r=g","caption":"Monika Kauliute"},"url":"https:\/\/trifork.nl\/blog\/author\/monika\/"}]}},"_links":{"self":[{"href":"https:\/\/trifork.nl\/blog\/wp-json\/wp\/v2\/posts\/15185","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/trifork.nl\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/trifork.nl\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/trifork.nl\/blog\/wp-json\/wp\/v2\/users\/84"}],"replies":[{"embeddable":true,"href":"https:\/\/trifork.nl\/blog\/wp-json\/wp\/v2\/comments?post=15185"}],"version-history":[{"count":0,"href":"https:\/\/trifork.nl\/blog\/wp-json\/wp\/v2\/posts\/15185\/revisions"}],"wp:attachment":[{"href":"https:\/\/trifork.nl\/blog\/wp-json\/wp\/v2\/media?parent=15185"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/trifork.nl\/blog\/wp-json\/wp\/v2\/categories?post=15185"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/trifork.nl\/blog\/wp-json\/wp\/v2\/tags?post=15185"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}