Machine Learning: Mixture Density Network (MDN)-RNN (41)
Table of contents
Make sure that you have already gone through the RNN Blog
Once done, let’s start
A baseball player gets 0.39 seconds to hit a ball whereas a human reaction time is nearly 0.25 seconds.
Despite having 0.14 seconds more, it’s not easy to react. Because you have to move your hand and a bat alongside it . There are so many things to do. To solve this, baseball players start moving their bat earlier so that, they can save time and focus much more on the ball, direction to react once it appears near.
That’s what we can do using MDN-RNN
Here we can see RNN is getting value(h) and passing it to next RNN etc.
Let’s play a game
Predict the next work :
Can’t guess? Let’s add more words
Now, you can predict some words here
You can use any words but atleast now it will make sense right?
So, just like that, we need words from past (Too cool) and present (for), to predict the future (school).
But why MDN in this architecture?
MDN helps to predict undeterministic value which means we don’t want to fix an answer. From the above example, assume that you used “school” to fill the gap and your friend said, it should be “office”
Is that wrong? Nope!!
The answer works for it too! If we fix/determine, that might restrict more options.
So, when we want to use in a World Model,
You will see that the Variational Encoder is the spatial representation, it is the space, it deals with how the environment looks and what the space can potentially look like, it gives us many different options of what the environment might be like through that latent vector and how we mapped it onto a distribution.
Whereas the MDN-RNN, it gives us options and different variations of time, how the future can look like, what different things might happen in the future.
So together they model space-time, this is space (V), this is time (M), and both of them have distributions, they are nondeterministic they way we built them, they're nondeterministic and therefore, they're giving us different possibilities of space and time and that is allowing us to train our neural network in that.
Let’s see a demo:
If we only use the V (Variational AutoEncoder), we can see a racing car
as you can see, the car is driving very, it's kind of like jerky from left to right, and it's missing these sharp corners.
But now once we add a MDN/RNN to it, so Model M, on top of Model V, you can see the result is much smoother,so it's not as shaky, it's not as wobbly, and at the same time, it can better take these sharp corners because it can predict the future, what's happening in the future.
Let’s understand the theory now:
Firstly, let’s remember how a neural network work:
Let’s assume that we are passing this image through the Neural Network
Now, we got the weight of the dog from the neural network
But it’s not correct!
Then back propagation happens and more and more images are used for training
And gradually the model can predict more accurate results. Here you can see in the last image that, the model predicted 5.5 lb and the dog weight around 5.7 lb.
Great!!
This is how a model work.
But what if we get a range rather than an exact value?
Here we got the mu (u) as the weight and we have sigma to give the range. Then we
can say with a 68% confidence that the weight of this dog is between 5.1 and 5.9 pounds . Then if we go even further and take two standard deviations each way then we can say that with a 95% degree confidence that the weight of this dog is between 4.7 and 6.3 pounds.
So, we are asking our neural network to output 2 values
So, using that we can find our range out output result.
In MDN, we assume that any General distribution (purple)
can be broken down to a mixture of Gaussian/mixture (red and blue) of normal distribution.
As, we can divide one distribution to multiple distribution, here is what we can do to our output
Earlier we had 2 output nodes (mu and sigma) and now we have 2 values for each solution (mu1, mu2, sigma1, sigma2). What about the alpha1 and alph2?
They are the weights at which these distributions are added up together to get the general distribution. So if we look at the next slide here our mixture model is the sum of alpha one times first distribution plus alpha two times the second distribution.And plus, we could have, once again we could have many more distributions and each one of them would have it's own weight.
So, this is how our input image can be resulted into a Mixture density graph
Read more:
Mixture Density Networks
Probabilistic Forecasting of Household Electrical Load Using Artificial Neural Networks
Let’s draw something
Go to the page
We have so many demos here (sketch rnn demo, interpoltion demo, varitional encoder demo etc)
Go here, choose anything like cat, book or else. Draw a small portion and AI will try to complete the image.
Yes, it’s not perfect but uses RNN to complete the image.
I just chose cat from the drop down and draw the circle (I know it’s not a perfect circle, haha) . Look, what the model draw for me
This page has a paper associated with it. Read the paper