recurrent neural network

Let's compare our knows Neural networks with brain

Here Temporal lobe is used for ANN to keep weights

Occipital lobe which is used for vision is compared to CNN

The Frontal lobe is used to related RNN to store short term memory.

Till now we have learned this:

Let's squash them

then make that this

and then

to mean the hidden layer, we turn green one to blue:

now that's a temporal loop

We have lot's of this

just to remind you, each of this circles have lots of neurons which we squashed, remember?

So. it's actually

but we will consider this

and this structure gives one the opportunity to have short time memory and pass that to new neurons

So, that's the whole concept that they remember what was in that neuron just previously. And then before that it just remembers what it was previously, and that allows them to pass information on to themselves.

some of the example:

One to many: One input and multiple output

here the dog's image was fed to CNN and it then passed to RNN which generated "black and white dog jumps over bar."

Many to One: Here a big sentence is provided and summarize and say what it means in a single word . So, it's sentiment analysis

Many to many: Assume we have a huge sentence and we want to translate it to different language

In some language it's important to mention boy or girl as depending on that, the sentence changes.

So, we need lots of words and we are also generating lots of outputs.

Also, subtitling movies are an example of Many to Many

you need short term memory to know the plot. We can't start every subtitling from zero. We have to reference from earlier.

Vanishing Gradient Problem

Previously in ANN, we learned how we use gradient descent to get global minima and then use that in ANN . We get and output and adjust our weight based on the cost function. The goal was to minimize the cost function.

But here in RNN, we take an input and it is passed to second one and goes and goes. It becomes much more complex.

You've got lots of, it travels through time and information from previous time points goes, keeps coming through the network, and remember that every node here, it's always very important to remember for neural networks that every single node here is not just a node, it's a representation of a whole layer of nodes.

And so, at each point in time you can calculate your cost function, or your error.

your cost function, or your error. So, basically your cost function compares your output which is in the red circle to your desired output, to what you should be getting, and this happens during the training.

Let's focus on this one specifically at time, t. So you've calculated the cost function epsilon t and now you want to propagate your cost function back through the network.

Well, how is this going to look? Because you need to update the weight. So, every single neuron which participated in the calculation of the output associated with this cost function should be, should have their weight updated to, in order to help it better calculate the output, to minimize that error.

This is where the problem lies, that you have to update or you have to propagate all the way back through time to these neurons.

So, this is the math behind RNNs

So, here we've got W rec, and W rec stands for weight recurring, and that is the weight that is used to connect the hidden layers to themselves in the unrolled temporal loop.

The thing here is that we're multiplying by the same exact weight (W rec)

many times, as many times as we need to go through.

When you multiply by something small your value decreases very quickly, and this, the multiplication comes from this P here.

As you see in the practical tutorials your weights are assigned at a random value,

but random values close to zero, and from there the network trains them up

and identifies what they should be. But if you start with random double, a random W rec close to zero then because you're multiplying by it many times the more you multiply the lower the value gets.

So, if you start off you might have a certain gradient going through your network being back propagated through your network. Then you move backwards.

Your gradient becomes less.

Then your gradient becomes less, and then your gradient becomes even less

Why is it bad for the network?

Well, because the gradient as it goes back through the network, it is used to update the weights, and we know that already. Well, **the lower the gradient is the harder it is for the network to update the weights.**So, the lower the gradient the slower it's going to update the weights, and the higher the gradient the faster it's going to update the weights

And so, therefore if you have for instance 1,000 epochs these weights for this layer might be updated towards the end of the 1,000 and they'll, you'll have some final results.

But these weights, because the gradient's so, so much smaller, they're gonna be updated slower and therefore by the end of the 1,000 epochs you might not have the final results there,

and therefore this part of the network is trained. This part of the network is not trained,

But the problem here is not just that half your network is not trained properly, but also that these weights, or this layer, its outputs are being used as inputs for further layers.

So, the training here has been happening all along based on, based of inputs that are coming from untrained neurons, untrained layers.

You're training here and you think you're getting good results, but because you're gradient's so small here this is training so slow and the outputs it's giving is so, are incorrect, are not final outputs, and therefore you're training on the non-final output.

if W rec is small, then your, you have a vanishing gradient problem. If W rec is large you have an exploding gradient problem.

And so, how do you combat the exploding gradient problem?

There's a couple of solutions. For instance, for the exploding gradient you can have truncated back propagation. So, you stop back propagating after a certain point, but as you can imagine that's probably not optimal because then you're not updating all the weights. But if you don't stop, then you're just going to have a completely irrelevant network. So, it's better than the original approach.

And then you can have penalties. So, you could have the gradient being penalized and being artificially reduced.

You can have gradient clipping. So, you could have a maximum limit for the gradient. You could say never, never have the gradient go over this value, and then if it's over that value it just stays at that value as it propagates further down through a network.

You can have, for the vanishing gradient problem you have certain other solutions.

You have weight initialization, where you are smart about how you initialize your weights to minimize the potential for vanishing gradient.

You can have, there's a type of network called the echo state networks, and we're not going to talk about that, but they do somehow solve that, or they are designed to solvethe vanishing gradient problem,

but there's also a different type of network called the long short-term memory networks, or the LSTMs for short, which are extremely popular, which are considered to be the go to network for implementing recurrent neural networks,

Long Short Term Memory (LSTM)

This is a network type with which we can solve the vanishing gradient descent

So if we have Wrec is, in simple terms, less than one, then we have vanishing gradient, if we have Wrec greater than one, we've got exploding gradient. What's the first thing that comes to mind to solve this problem?

Well, the first thing that comes to mind is to make Wrec equal one,

This is what it looks like if you dig in inside the recurrent neural network.

But simply put, as you have outputs coming into your module, into this module, this operation's applied and then goes into the next module operation's applied, and so on. So as you apply this operation, when you back propagate, it goes through all of this, and that's where the weights are applied, that's where the Wrec is sitting.

This is the standard RNN.

and this is what the LSTM version looks like.

That LSTMs have a memory cell, what's called a memory cell, I call it memory pipeline, which just goes through time,

and it can very freely flow through time. Sometimes it might be removed and erased. Sometimes things might be added into it, but that's pretty much it. Otherwise it flows through times freely and therefore when you back propagate through these LSTMs, you don't have that problem of your vanishing gradient.

Now let's dig in in a bit more detail.

So Ct stands for memory.

A memory cell, I guess.

ht is your output, so you can see there's your output going out into the world, and here you've got your output going into the next module. Into the next block.

And then here you've got your input, Xt.

And everywhere, these are all layers, so just remember that, and we're gonna reference them as vectors because that's pretty much the same thing.

And let's go through the legend:

We've got vector transfers, so any line here is a vector being transferred or kind of moving around in this architecture.

Concatenation means that you're combing these two X's on top of each other

but I think it's even easier to understand if you just think of it as there are two pipes running here but they're running parallel to each other,

Then you've got pointwise operations. Now we get to the interesting stuff. You've got a couple of pointwise operations here,

The X's are valves and they all have names. This is the forget valve, the memory valve, and the output valve.

a valve looks like this and we're going to kind of think of it that way as well. So you've got water or basically something flowing through, and then you can either close it

So you've got the forget valve is basically controlled by this layer operation

based on the decision made here, this valve is either closed or open, so if it's open, memory flows through freely. If it's closed then memory is cut off and therefore it's not transferred further and then new memory just will be added here probably based on the results here.

Then you've got the memory valve, which again is controlled by a sigma. Sigma stands for the sigmoid activation function. That means that that's the activation function

And as the decision is made here,

this value, which is another layer operation,

is either added to the memory

depending on the value that is decided in this valve.

As you remember, why we're using sigmoid is because they are from zero to one, and therefore zero stands for nothing goes through, one stands for something goes through, and then here you've got the valve

And then we've got the T-shaped kind of like pipe,

That is over here.

So where you have memory going through and then you can add additional memories,

You've got the tangent operation here. That's more mathematical behind it, why you want to be between minus one and one

And you've got the neural network layer operations over here. That's their representation. You've got a layer coming in

and then you've got a layer coming out.

So we're ready to look into this in step by step:

We've got a new value coming in. You've got a value coming from a previous node, and together they are combined to decide whether this valve should go ahead or should be closed or should be open, or somewhat closed or open.

Lots of neurons here (xt), lots of neurons here (ht), and then all of that is basically in one layer operation is used to decide what value we're going to pass through,not,

and to what extent.

Then here we've got the memory flowing through (-->). We've got the forget valve, if it's closed or open, we've got memory valve, closed or open, and we're adding in some memory possibly if we want to update.

And here, finally we've got these two values combined to decide what part of the memory pipeline is going to become output of this module. Is it going to go to fully as the output.

So that's how it all works.

Let's have a look at a specific example to understand this a bit better.The example we talked about, I am a boy who likes to learn.

(speaking in foreign language) In translation to Czech.

If this were girl, then here would be (speaking in foreign language), meaning that these two words, not just this word would change, but also it would affect these two words. So different in Czech rather than in English. So these words are affected by the gender of the subject. And therefore, in our LSTM we might want to store the subject, boy in this case, in memory.

So for instance, let's say boy is stored here, and then is just flowing through freely, and nothing is changing.

If our new information doesn't tell us that there's a new subject, we'd just having boy flowing through freely and it keeps flowing like that.

If, for instance we have a new subject, we have girl or we have a name, like Amanda, or something else and we understand that we've got a new subject there, we'll close this valve. We'll say, destroy the memory that we had,

then open this valve, put in new memory, and then put the name here, put the subject.

and so that's how this valve works, and this valve works. And so what this valve does is it extracts certain information from what you have in the memory.

So then this valve will facilitate the extraction of the gender, and that will go as an input into your next module, and it'll help the next module,

That's how the long short-term memory works!

Read this paper and check this blog and this one

Let's see some real world example of how it works

Here green color means activated and we have w 1st word and it's predicting that the next word should be w which is red below .

again, the second green w predicts that it's next word would be w ( given in red below)

and the last green w predicts that it's next word would be a dot ( in red below)

They were all correct

But after that we see . predicts that it's next word would be b, which is wrong. Again this time the red color is not as red as before. So, this means the model is not as confident as before while guessing this.

Same happened for y and n and so on.

So, here color means which are active and which are predicted with most confidence.

Let's code this down:

Problem Statement

We're gonna have to tackle a very challenging problem in this part, because we're gonna have to predict the stock price of nothing else than Google. So it's pretty challenging, especially if you have some notions in financial engineering, since indeed there is the Brownian Motion that states that the future variations of the stock price are independent from the past, so it's actually impossible to predict exactly the future stock price otherwise we would all become billionaires but it's actually possible to predict some trends.

How are we going to approach this problem?

Well, we're gonna train our LSTM model on five years of the Google stock price, and this is from the beginning of 2012 to the end of 2016 and then, based on this training, based on the correlations identified or captured by the LSTM of the Google stock price, we will try to predict the first month of 2017. We're gonna try to predict January 2017. And again, we're not going to try to predict exactly the stock price, we're gonna try to predict the trend, the upward or downward trend of the Google stock price.

So, the training csv file is this

Here we have stock prices from 2012 to 2016

Let's see some graph only on the open column from 2012 to 2016

We did select the Date and Open column and then went to insert and then recommended chart

We can see some upward trends and downward trends

Again, for the test set we have data of the first month of January ,2017

So, if we create another chart with Date and Open , we see this

We see some upward trend at the end of the month january

Importing the libraries

import numpy as np #for mathematical functions

import matplotlib.pyplot as plt #for graphs etc

import pandas as pd #to work with dataset

Importing the training set

Our goal is to predict what we will get in the Open column in january 2017

So, we will take all rows but column of Open

but we can't use iloc[:,1] to mean the Open column although it's right.

Reason?

We cannot input the index one only, because we want to create a NumPy array(Keras needs array as input) and not a simple vector. And to do this, the trick is to take not a single index one, but the range from one to two,

training_set=dataset_train.iloc[:,1:2]

This too mean the Open column with all rows included but, ranges in Python have their upper bound excluded. Therefore, the two here is excluded, and therefore even by taking the range from one to two, we are only getting the first index,and this will make sure, eventually, to have a NumPy array of one column.

Now, we have chosen the column Open but we need to turn this to a numpy array.

For that reason, just add .values after that

training_set=dataset_train.iloc[:,1:2].values

if we print this, we can now see that dataset_train has all of the value of the csv file but the training_set has only the first column but it's a 2D array!!

But look if we wouldn't have used iloc[:,1:2] (To mean the Open Column and instead use iloc[:,1])

we would have just got an 1D array

Feature Scaling

we know these 2 concepts till now which is used in feature scaling

This time it's more recommended to use Normalization (Recommended in RNN and in used case of sigmoids)

importing the MinMaxScaler class

from sklearn.preprocessing import MinMaxScaler #importing MinMaxScaler

Then the object creation

sc= MinMaxScaler(feature_range=(0,1)) #with the normalization term, all value will be between 0 and 1

Well, fit means that it just going to get the min of the data, that is the minimum stock price and the maximum stock price, to be able to apply this formula. That's simply what it means. And then with the transform method, it's going to compute for each of the stock prices of the training set, the scaled stock prices according to this formula.

training_set_scaled=sc.fit_transform(training_set) #Using fit_transform will fit(will get access to min(x) and max(x) of the formula) and transform (scale) it

Now, you can see that our training_set has been scaled between 0 and 1

Now, we are going to take 60 time steps and one output

So first, let me explain what it means. Okay, so 60 timesteps means that at each time T, the RNN is going to look at the 60 stock prices before time T, that is the stock prices between 60 days before time T and time T, and based on the trends, it is capturing during these 60 previous timesteps, it will try to predict the next output. So 60 timesteps of the past information from which our RNN is gonna try to learn and understand some correlations, or some trends, and based on its understanding, it's going to try to predict the next output. That is, the stock price at time T plus one. So that's very important to understand.

This 60 time steps was totally experimental and you can take a different value for that.

So basically for each observation, that is for each financial day, X_train will contain the 60 previews stock prices, before that financial day, and y_train will contain the stock price the next financial day.

Now we will loop and get access to the values.

I need to start the for loop with 60, because then for each i, which is the index of the stock price observation, I'm gonna get the range from i minus 60 to i, which exactly contains the 60 preview stock prices before the stock price at time T.

And what do we want to append inside X_train?

Well, we want to append the 60 previews stock prices before the stock price of index i,

We are having values from i-60 to i

X_train.append(training_set_scaled[i-60:i,0])

How it works?

for example, we start with 60 and so,

X_train will values from index 60-60:60 which means from index 0 to 59 (first 60 values)

as python does not include the second one after :

for example, I have just used this and showing you how this works when we have i=60

or, we can see that by this code to compare what is the training_set_scaled data and what is X_train data

You can see 60 values added here. This is not part of the code rather just to show how it works. Let's back to code.

Y_train will have the next index which is index 60 . Because with X_train we are trying to predict what the 60th index will be

So, we are keeping the actual value of the 60th Index in y_train

We will then check if we were able to predict the right one from X_train.

Ok?

So, from here we can see that y_train.append(training_set_scaled[i,0])[when i=60]

gives us the 60th index value of training_set_scaled data which

Let's go back to the main code now.

Then we convert X_train and y_train to numpy array

X_train,y_train=np.array(X_train),np.array(y_train)

If I run this whole code in Spyder, we can visualize it much better.

Let me show you

This is our X_train

So, we have values from index 0 to 59

Now, why this format or what this is ...Keep it in mind that i will have value 60, 61, 62,.......and then end up at 1257

So, here we have organized the data in such a manner that you can see our 1st index value is below the 0th index. Isn't this what we talked in our theory part?

We did mention here that the green words are trying to predict what the next word is going to be . They are predicting that below them (in the red box)

So, it's the same format.

Again the y_train has values starting from 60 to the end

This is the dataset for X_train from index 60

again, the y_train has values

Now, you may ask why from 60? why not from 1st index?

As I mentioned. You can experiment that number but with the value 60, we get better result in this dataset applying RNN model

Reshaping

And right now we have one indicator, which is the open Google Stock Price. And we take the 60 previous Google Stock Prices to predict the next one.

But thanks to this new data structure, this new dimensionality, you will be able to add if you want, some more indicators.

here first we will add is what to reshape which is X_train

and the second one is second argument of this reshape function we need to specify this new structure, that is this new shape we want our numpy array or, X_train.

So, first let's understand what we are doing

we have X_train with 2D

We will add one more dimension and make it 3D

How to do that?

Let's go to the Keras Documentation and look for Recurrent Layers

And then the base RNN class and then you will see the Input shapes

tensor is an array

The input should be a 3D array, containing the following three dimensions.

First, the batch size, which will correspond to the total number of observations we have. That is the total number of stock prices we have from 2012 to the end of 2016.

Then the second dimension is the time steps, the number of time steps. And that is, as you understood, 60.

And now the third dimension, this new one that we're adding is the one corresponding to the indicators, the predictors. So I remind, these can be some new financial indicators that could help predict the Google stock price trends.

So, for example, that can be the "closed" Google stock price

or even some other stock prices from other companies.

So, the second one will be (batch_size, timesteps, features)

here batch_size is X_train.shape[0] which gives us 1198 values (This is the batch_size because the X_train has 1198 rows

timesteps is X_train.shape[1] which gives us 60

features will be the Open column again meaning 1

So,

X_train=np.reshape(X_train, (X_train.shape[0],X_train.shape[1],1))

You might not understand what's going on

Let's do this on Spyder

This are the values in 0th axis. We have 59 rows here and 1 column. This has all of the values of the column 0 we had in X_train previously.

On 1st axis, we have 1198 rows and 1 column

On 2nd axis, we have all of the values we had of X_train (60 column and 1198 rows)

Building RNN

from keras.models import Sequential #it will have sequential layers

from keras.layers import Dense #for output layer

from keras.layers import LSTM #for lstm layers

from keras.layers import Dropout #some dropout regularization

for sequence of layers, we create an object

regressor = Sequential() #an object of the sequential class

Now we will add LSTM layers

we first add method and within that we will add LSTM object

We need to input three arguments, and it's very important to understand what they are. These three arguments are, first, the number of units, which is the number of LSTM cells or units you want to have in this LSTM layer. We will choose a relevant number, and then use dropout regularization to avoid overfitting

Then, the second argument of this LSTM class, is return sequences and we will have to set it equal to true, because we are building a stacked LSTM which therefore will have several LSTM layers and when you add another LSTM layer after the one you are creating right now, well you have to set the return sequences argument to true.

Once you are done with your LSTM layers, you are not gonna add another one after that, you will set it equal to false, but you won't even have to do this because this is the default value of the return sequences parameter.

Now, the third and last argument is input_shape, and that is exactly the shape of the input containing x_train that we created in the last step of the data preprocessing part. It's an input shape in three dimensions, in 3-D, corresponding to the observations, the time steps, and the indicators.

But in this third argument of the LSTM class, we won't have to include the three dimensions, only the two last ones corresponding to the time steps and the indicators, because the first one, corresponding to the observations, will be automatically taken into account.

Let's work with the units now! What is this number going to be?

Well, even if we are going to stack many layers, we want our model to have a very high dimensionality. So, indeed we're making this dimensionality high thanks to the multiple LSTM layers that we're gonna add, but we can increase even more this dimensionality by including a large number of neurons in each of the LSTM layers. Since capturing the trends of a stock price is pretty complex, we need to have this high dimensionality and therefore we also need to have a large number of neurons in each of the multiple LSTM layers. And therefore, number of neurons we'll choose for this first LSTM layer is gonna be 50 (Taking a good high value). 50 neurons in this first LSTM layer and then in the next ones, they will also have 50 neurons, will get us a model with high dimensionality.

And if we had chosen a too small number of neurons in each of the LSTM layers, like, for example, three to five neurons, well, that would be too little and it wouldn't be able to capture very well the upward and downward trends.

But with 50 neurons, this will lead us to better results. That's the first argument, this is what we have to input here, so that's done.

Now let's work with return sequence

That we have to set equal to true, and I remind us because we're gonna add another LSTM layer after this one because we're making a stacked LSTM neural networkand therefore, when you're adding another one,you have to set this return sequence parameter to true.

The last one with input_shape

As mentioned earlier, we will just use the two last ones corresponding to the time stepsX_train.shape[1]and the indicators/features1

Finally, adding the dropout regularization

regressor.add(Dropout(0.2))

here 20% is the dropout rate which is the rate of neurons you wanna drop, that is you wanna ignore, in the layers to do this regularization.

Adding second , third layer & fourth layer of LSTM

In 2nd and 3rd LSTM layer, we don't need to add input_shape anymore here as we have already taken input in 1st layer

In the last layer we won't add return_statement=True because by default it's false meaning we have no more LSTM layer after that.

Adding the output layer

regressor.add(Dense(units = 1)) #for output layer we use Dense class with 1 neurons

for full connection to the output layer, we use dense class (layer) and we expect 1 unit or neuron.

Compiling the RNN

We will compile with right optimizer and loss function

regressor.compile(optimizer = 'adam', loss = 'mean_squared_error')

We will use optimizer adam which is a safe option everytime. For the loss function, we will use mean_squared_error

Fitting the RNN to the Training set

regressor.fit(X_train,y_train,epochs=100,batch_size=32)

The parameters we will have within fit is inputs X, input Y, number of epochs (How many time you want to hold the data for forward propapgation and backward), batch_size (after every 32 stock prices, it will back propagate)

We can see that our loss was firstly 19% and after 100 epochs, it's 7%

Making the predictions and visualising the results

Getting the real stock price of 2017

dataset_test =pd.read_csv('Google_Stock_Price_Test.csv')

real_stock_price = dataset_test.iloc[:, 1:2].values

We know that the real data for january,2017 is on the Test csv file

so, we will import that just like the Training csv file

Getting the predicted stock price of 2017

Now, the first thing to understand is that **we trained our model to be able to predict the stock price at time T plus one, based on the 60 previous stock prices,**and, therefore, to predict each stock price of each financial day of January, 2017,we will need the 60 previous stock prices of the 60 previous financial days, before the actual day. That's the first key point to understand.

For example, if we want to know the stock price of 3rd January 2017, we need the data of 4th November 2016 to 2nd November 2016 (60 days of data to predict 3rd january's trend)

Like that, for january4th we need data from 5th November 2016 to 3rd January 2017

(Although this data has excluded Satuday and Sunday as offday but just for example, I have added that here)

Now, second key point is that in order to get at each day of January, 2017, the 60 previous stock prices of the 60 previous days, well, we will need both the training set and the test set, because we will have some of the 60 days that will be from the training set, because they will be from December 2016, and we will also have some stock prices of the test set, because some of them will come from January 2017.

Check my previous example, that to get the stock price of 4th January, we need data from November to December (Which we get in the training set)and also 1 data from January which is not in the training set. We have that data in test set.

The ideas is to add dataset of test and training together as those are not scaled and they are raw data.

dataset_total = pd.concat() #1st argument will be the two datasets we want to add (dataset_train['Open'],dataset_test['Open']) meaning we are adding column Open

and then the axis . We will add them vertically which is labeled as 0

[Note: Horizontal concatenate will have axis 1]

Second step, now the second step is to get the inputs, that I told you about, that is at each time T, or if you want at each financial day of January 2017, we need to get the 60 previous stock prices of the 60 previous financial days. Or if you want the stock prices of the three previous months, because there are 20 financial days in one month, and, therefore, 60 financial days would make three months. Right?

For the first financial day of 2017, we will need the 60 previous financial days before this first financial day of 2017, which I remind is January 3rd, and so that means that the lower bounds of the range of input needs,will be the first financial day of January, 2017 minus 60,and then what is going to be the upper bound?

Well, to get the upper bound, we need to think what input we'll need to predict the last financial day of January, 2017,and that will be the 60 previous financial days before this last day of January 2017,and therefore the last stock price we'll need in the input,will be the previous stock price just before the last stock price that we predict.

We take the len of the total dataset, dataset_total, and then we subtract to it the len of our dataset_test.Alright this gives us exactly January 3rd, because len(dataset_total) is the final indexof the whole dataset containing both the training and the test set,and len(dataset_test) is exactly 20. So, since len(dataset_total) will be the index of the last day of January, 2017, well if we subtract this by 20,we will get to January 3, 2017; the first day, the first financial day of January, 2017

len(dataset_total)-len(dataset_test)

and now, remember we wanna get the stock price at this day minus 60 because that's the lower bound of the inputs we need, and therefore I'm adding here minus 60

len(dataset_total)-len(dataset_test)-60

and now much easier to get the upper bound, we simply need to add this column (column 1, we have just one column and thus we will just use : to take whatever we have ) , because basically the upper bound, as we said, is the last index of the whole dataset,because to predict the last stock priceof the last financial day, we will need the 60 previous stock prices, and, therefore, the last of the last stock prices we'll need is the stock price just before the last financial day

inputs=dataset_total[len(dataset_total)-len(dataset_test)-60:]

Now, let's convert to numpy array

inputs=dataset_total[len(dataset_total)-len(dataset_test)-60:].values

so first let's make this simple reshape to get the right NumPy shape. We need to input minus one

and plus one.

Then we reshape and rescale

inputs=inputs.reshape(-1,1)

We use transform function to scale it

inputs=sc.transform(inputs)

Visualizing the results

here you can see the red graph which is the based on real data

We have already seen that on the test csv file

and the blue graph is the predicted graph. Although they are not exactly same but that was able to predict that, by the end of the month, there will be some upward movement in the graph.

Practice from here

here are different ways to improve the RNN model:

Getting more training data: we trained our model on the past 5 years of the Google Stock Price but it would be even better to train it on the past 10 years.
Increasing the number of timesteps: the model remembered the stock prices from the 60 previous financial days to predict the stock price of the next day. That’s because we chose a number of 60 timesteps (3 months). You could try to increase the number of timesteps, by choosing for example 120 timesteps (6 months).
Adding some other indicators: if you have the financial instinct that the stock price of some other companies might be correlated to the one of Google, you could add this other stock price as a new indicator in the training data.
Adding more LSTM layers: we built a RNN with four LSTM layers but you could try with even more.
Adding more neurones in the LSTM layers: we highlighted the fact that we needed a high number of neurones in the LSTM layers to respond better to the complexity of the problem and we chose to include 50 neurones in each of our 4 LSTM layers. You could try an architecture with even more neurones in each of the 4 (or more) LSTM layer

Done!

Machine Learning : Deep Learning - RNN (Part 28)