TinyML (Part 3) : A simple Neural Network explained

Assume that we have X and Y provided and we want to find a model so that, once we give a new X we can get the accurate Y

This is what we will do using 1 layer of neural network and here is the code

Let's break this down:

The architecture there was a neural network with a single layer. And this contained a single neuron indicated by this line of code where we have a sequential defined and it only has one entry for the one layer

So our neural network will really look like this, a single neuron with x in and y out. After training the neuron to learn the relationship between x and y, we can then ask it to predict the y for a new x.

we can then ask it to predict the y for a new x. So for example, what is the y when x equals 10? We know that the relationship between x and y is y equals 2x minus 1, and the neuron learns something close to that.

When you print the results of model.predict(10.0), you'll get something close to 19, but not exactly 19. It'll be more like 18.998. To see why, let's look under the hood a little.

Revisiting our neuron with an x in and a y out, we can say that y is a function of x. And that function is y equals wx plus b, the linear equation.

So we could redraw our architecture a little bit, thinking in terms of parameters. If y equals wx plus b, we need to know the values of w and b in order to get y from x.So how can a computer figure these out?

Well, we can start by emulating the neuron. And here we can do so with a class called Model. This has internal values w and B, and we can initialize them to anything we want. Say, we set them to be 10.0 each.

The class will also have a call function that will return wx plus b for a given x.

So for our known set of x's, let's see what this model will give. As we initialize w and b to be 10, y equals wx plus b will give us this set. Where x is minus 1, y be 0. Where x is 0, y will be 10, and so on. Now this, of course, is very different for the set of y's that we already

When x is minus 1, y is supposed to be minus 3. But we got 0. When x is 0, y is supposed to be minus 1, but we got 10. Indeed, when x is 4, y is supposed to be 7, but we got 50. We're clearly way off.

So recall the paradigm diagram where we make a guess, measure our accuracy, optimize based on this information, and then repeat.

You can see at this point with w equals 10 and b equals 10, we've made our first guess.

And by comparing our y's with the expected ones, we can see that our accuracy was way off.

If we want to measure that accuracy, we can create a loss function in code where we give it the predicted y, which is the set we calculated, and the target y, which is the set that we were supplied. And from there, we can calculate the root mean square of the difference as explained in an earlier lesson.

Then this code is our underlying training code.

We'll use a gradient tape to keep track of our differential variables. And within it, we can calculate our loss using the loss function that we just saw.

Our current loss is the result of the loss function if we pass it the values returned by the model and our real y's.

The next step is to optimize our guess.

And as we're using a gradient tape which tracks our variables, we can differentiate our model's w's and our model's b's against the loss function to get the gradients for w and b.

These gradients then help us get the direction towards the bottom of the loss curve, so we can generate our next guesses by reassigning w and b within our model to a new value in the direction that the gradient gave us.

So on our machine learning paradigm diagram, the next step was to repeat. We now have a new w and a new b, so we can make a guess with them and repeat the whole process.

We can see that in simple code like this, where we loop through the training function that we just looked at 50 times. So 50 times, we'll make a guess, measure the loss, differentiate to get the gradient of the variables with respect to the loss, and then use that to tweak their values in the direction of loss and so on.

By the time we've done this, our initial w equals 10 and b equals 10 will have changed to something like this-- w equals 1.98 and b equals negative 0.94.

Now you and I both know what the desired values are based on the x and y that we were given and they're supposed to be 2 and minus 1, respectively. So this process has gotten us pretty close.

And now if we want to use this neuron to give us the y for any future x, it will calculate y equals 1.9 x minus 0.94.

Try this code

In the previous part we created a function that had two parameters -- w and b, and returned a value f(x) = wx+b. You then saw how to use the machine learning training loop to adjust these parameters so that the correct values could be ‘learned’ over time.

You also, earlier, saw how this works in TensorFlow with machine learning.In the previous video we created a function that had two parameters -- w and b, and returned a value f(x) = wx+b. You then saw how to use the machine learning training loop to adjust these parameters so that the correct values could be ‘learned’ over time.

You also, earlier, saw how this works in TensorFlow with machine learning.

import sys

import numpy as np
import tensorflow as tf

my_layer = tf.keras.layers.Dense(units=1, input_shape=[1])

model = tf.keras.Sequential([my_layer])

model.compile(optimizer='sgd', loss='mean_squared_error')

xs = np.array([-1.0,  0.0, 1.0, 2.0, 3.0, 4.0], dtype=float)

ys = np.array([-3.0, -1.0, 1.0, 3.0, 5.0, 7.0], dtype=float)



model.fit(xs, ys, epochs=500)

I’ve changed the code slightly so that instead of having the layer code within the tf.keras.Sequential(), I declared it as my_layer. Now, if you train this network, and try to predict a value for a given X like this:

print(model.predict([10.0]))

You’ll see a value that’s close to 19. It did this by learning the internal parameters of the neuron, and you can inspect the neuron by looking at my_layer using it’s get_weights() parameter:

You’ll see that you get back two arrays. The first contains the w value -- which after running for 500 epochs gives you a value that’s very close to 2! Similarly the second contains the b value, which got learned to be very close to -1.

Multiple Neural Network

But what would it look like if you used more than just a single neuron? So, for example, if you used a multiple neural network that looks like this?

Two inputs from x into a neuron. Each neuron creates an output that is combined leading to the output y.

To implement this in code, you’d use 2 layers, the first with 2 neurons, and the second with 1 neuron. It would look like this:

my_layer_1 = tf.keras.layers.Dense(units=2, input_shape=[1])

my_layer_2 = tf.keras.layers.Dense(units=1)

model = tf.keras.Sequential([my_layer_1, my_layer_2])

model.compile(optimizer='sgd', loss='mean_squared_error')

xs = np.array([-1.0,  0.0, 1.0, 2.0, 3.0, 4.0], dtype=float)

ys = np.array([-3.0, -1.0, 1.0, 3.0, 5.0, 7.0], dtype=float)

model.fit(xs, ys, epochs=500)

So, if you look at the parameters in this case, there’ll be a little bit of a difference in the second layer. Typically we have said that our neuron has an input, x, and an output, y, and y will equal wx+b, where w and b are learned parameters. However, when we have 2 inputs, as you can see above, what will happen is that the formula will change where there is a separate w for each input.

The neuron in the second layer has 2 inputs, so it will, instead of having y = w*x+b, it will have y = w1*x1+w2*x2+b, where x1 is the output of the first neuron in the previous layer, and x2 is the output of the second neuron in the previous layer. Naturally, if there are more than 2 neurons in the previous layer, then that number of weights will be learned.

If you run the above code to learn the parameters, then inspect the parameters:

This will give you the weights and biases for the neurons in the layer. Note that it isn’t a list of weight and bias for the first neuron followed by weight and bias for the second. In the above example 1.4040651 is the learned weight for the first neuron and -0.7996106 is the learned weight for the second. Similarly -0.50982034,0.20248567 are the learned biases for the first and second neurons respectively.

As mentioned earlier -- you can see that there are 2 weight values in this array, and a single bias. These weights are applied to the output of the previous neuron, and then they are summed and added to the bias.

You can inspect them manually, and apply the sum yourself like this:

This same process will apply for bigger and more dense neural networks, and will allow you to build models that learn more sophisticated patterns.

You can check out this code

Done!!