Machine Learning : Deep Learning - ANN (Part 26)
Deep Learning is the most exciting and powerful branch of Machine Learning.
Deep Learning models can be used for a variety of complex tasks:
Artificial Neural Networks for Regression and Classification
Convolutional Neural Networks for Computer Vision
Recurrent Neural Networks for Time Series Analysis
Self Organizing Maps for Feature Extraction
Deep Boltzmann Machines for Recommendation Systems
Auto Encoders for Recommendation Systems
Internet started on 80s but Deep learning has started to evolve now. Why?
because the memory we can use has increased now.
In the near future you can store all of the world's data in DNA storages
Artificial Neural Networks- ANN
The Neuron
Deep learning uses Neural Networks which mimics the Human neuron
our goal is to create something close to human neurons.
Image of two neurons (drawn earlier)
again this is how we define neurons
Dendrites receive info and Axon passes info.
this can be seen as
we have to standardize (Standardizing and normalizing is not same.Read this out) the input values.
1 layer of input and 1 layer of output.
The inputs passes weights and that's how we basically modify the model.
in the neuron, all the weights adds up (Weighted sum)
Then we run an activation function.
Then we pass it to the output layer
Activation function
Let's analyze 4 popular activation function
Threshold function
depending on the value of x, we get 1 or 0
Sigmoid
This is very famous one used in final layer.
Rectifer
this is very famous in ANN.
Hiperbolic Tangent
The activation function decides whether a neuron should be activated or not by calculating the weighted sum and further adding bias to it. The purpose of the activation function is to introduce non-linearity into the output of a neuron
This is how it works. In the hidden layer (for example), rectifier activation function is used and in output layer, sigmoid activation function is used.
How do Neural Network work?
Let's assume that we are going to valuate the value of a property
Assume we have Area, Bedrooms , Distance to city and Age are our input values
So, this can be the solution. You can basically use most of the ML Models here.
But in Neural network, we have hidden layer which is something special.
Let's see how to use that.
Assume that these values are linked to the first neuron of hidden layer. Also, assume some weights of input are non-zero and some have zero.
Now, we can design that neuron in such a manner that it will activate only when a certain level of input is met.
Assume that, we have made this neuron focused on Area and Distance to city only
Now, we did set the middle neuron to focus on Area, bedroom and age
the last one connected to Age
The second one with Bedroom and Distance to city
The fourth one with all of the input neurons
Finally the output layer is connected to all hidden layer neurons
How to Neural Networks Learn?
In general two ways to learn. One is to hard code and give all of the rules and second one is to design the neural network so that it can learn itself.
what happens in NN is
We get an output value and we have actual value
Here using this Cost function , C,we find the error.
Now, this cost function(C) is passed to the output value and then it goes to the hidden layer and then to the input layer's weight.
The weights get updated
Our goal is to minimize the cost function and thus we modify the weight (Not the input value).
Assume this is a case:
Here the input neurons are Study hr, Sleep Hrs, Quiz and based on that, we got 93% which is actual value.
After training we get a cost function and we modify the weights to lessen the Cost function.
Now we feed the input neurons again with modified weights
We again modify the weights to adjust the cost function.
Again, if we run it and depending on the cost function, we modify the weight.
We run this again,
Again,
Again,
Now, here we go!
The cost function is 0 here (Generally never happens) and Output value is equal to Actual Value.
What about data with multiple rows?
So, every row is trained on the same Neural Network but different times
So, when we run the the first row, we get the output as this.
For the second row,
So, after running all rows, here is the graph
Output value as y cap and y as actual value (Exam)
Now, let's have the cost function, C
Now, we will update the weights to minimize the cost.
So, we will train again and again with modified weights.
Gradient Descent
We learned the weights gets adjusted. but how?
Let's learn it
Let's check a graph where we have plotted 1000 weights
The best weight will be here
Let's check an example of 25 weights
25 weights lead us to 1000 combinations
the faster super computer can operate at a speed of 93 petaflops.
which is 93*10^5 Floating operation per seconds
Let's say hypothetically that it can do one, test one, combination for our neural network in one flop, basically in one floating operation.
That means it will still require 10 to the power of 75 divided by 93 times 10 to the power of 15 seconds to run all of those tests,
So that means one, or approximately 10 to the power of 58 seconds and that is the same as 10 to the power of 50 years. That is a huge number that is longer than the universe has existed
So there we go this is a no no even on the worlds fastest super computer Sunway TaihuLight. So we have to come up with a very different approach how we're going to find the optimal weight.
We just tested for a simpler neural network.
What about if the neural network looks like something like this? Right? Or even greater than that?
Haha!
Let's learn how to make it easier and optimize the weights
So, let's say we start somewhere, you gotta start somewhere, so we start over there and from that point in the top left what we're going to do is we're going to look at the angle of our cost function at that point.
basically you just need to differentiate, find out what the slope is in that specific point and find out if the slope is positive or negative. If the slope is negative like in this case, means that you're going downhill.
So to the right is downhill to the left is uphill.And from there it means you need to go right basically you need to go downhill and that's what we're going to do.
we took a step and now we are here.
Again same thing you calculate the slope this time the slope is positive meaning right is uphill left is downhill and you need to go left.Again same thing you calculate the slope this time the slope is positive meaning right is uphill left is downhill and you need to go left.
now here,
and then
This is the best weight.
so this is what happened
This was an example of 1D
This is gradient descent in 2D
this is gradient descent in 3D
Stochastic gradient descent
We have till now worked with a graph where it was convex and had only a global minima.
but what if the function is not convex?
then we may find the local minima rather than global minima
So, to solve this issue. In Stochastic Gradient descent, when we run the neural network for each row, we adjust the weights at the same time.
Note : In normal Batch gradient descent, we adjust the neural networks at the end of the training of the all rows.
right shows , we update weight every time and in left shows, we update the weights once we run all of the rows.
Stochastic gradient descent is faster.
Read this out to learn more about gradient descent
Back propagation
we do it to fine tune the weights
For back propagation, read this
Let's work with the ANN now
Problem statement: We have 10k customers.
Well, the bank has been seeing unusual churn rates. So churn is when people leave the company and they've seen customers leaving at unusually high rates and they want to understand what the problem is and they want to assess and address that problem. And that's why they've hired you. To look into this dataset for them and give them some insights.
This bank looked in the dataset to see if the customers stayed (Exited = 0) or left (Exited = 1
So, of course, their interest is to keep the maximum customers and therefore they made this data set to understand the reason somehow, why customers leave the bank.
And mostly, once they managed to build a predictive model that can predict if any new customer leaves the bank, you know, model that was trained, of course on this data set, well, they will deploy this model on new customers and for all the customers where the model predicts that the customer leaves the bank, well, in that case they will be prepared and they might do some special offer to the customer so that it will stay in the bank, you see?
Importing the libraries
import numpy as np
import pandas as pd
We are importing Tensorflow for neural networks
import tensorflow as tf #importing tensorflow
Data Preprocessing
dataset =
pd.read
_csv('Churn_Modelling.csv') #importing the data
Here RowNumber CustomerId , Surname has no impact on if a user is going to exit or not.
So, we will take all rows but columns from index 3 to (last index -1).
X = dataset.iloc[:,3:-1].values
The last index is going to be used for y matrix
y = dataset.iloc[:,-1].values
Note: There is no data missing here. So, keep this in mind.
Follow this repository for data preprocessing codes
Let's encode the categorical data (Gender)
from sklearn.preprocessing import LabelEncoder #import
#obect
le = LabelEncoder()
#Gender column is in index 2. So, we will use that column
X[:,2] =
le.fit
_transform(X[:,2])
here female turned to 0 and male to 1
For country names we will use onehotencoding
from sklearn.compose import ColumnTransformer
from sklearn.preprocessing import OneHotEncoder
We will apply this to column 1 index so use [1]
ct = ColumnTransformer(transformers=[('encoder', OneHotEncoder(), [1])], remainder='passthrough')
X = np.array(
ct.fit
_transform(X))
once done, the dummy variables have moved to the first column
France turned to 1.0 0.0 0.0 and etc.
Then we split our dataset
Feature scaling (must)
We have to apply that to all of our features.
from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
We will feature scale to all of our features in Neural Network
X_train =
sc.fit
_transform(X_train)
X_test = sc.transform(X_test)
Building ANN
ANN is a sequence of layers. So, we will create an object of that.
#obect of sequential class
ann = tf.keras.models.Sequential() #initializing the ANN
Adding first input layer
ann.add(tf.keras.layers.Dense(number of neurons, activation function))
we have to experiment how many number of neurons is good.
let's pick 6 neurons
ann.add(tf.keras.layers.Dense(units=6))
in hidden layer, we use rectifier activation function
We use Dense class to keep our neurons fully connected
ann.add(tf.keras.layers.Dense(units=6,activation='relu'))
Second layer
ann.add(tf.keras.layers.Dense(units=6,activation='relu'))
Output layer
ann.add(tf.keras.layers.Dense(units=1,activation='sigmoid'))
In the output layer we use sigmoid activation function and here we just need 1 neuron which can mean (0/1). This number can change as well depending on cases.
Training the ANN
The method to compile is compile()
ann.compile(optimizer, loss function, metrics parameter)
optimizer = 'adam'
optimizer which will work with stochastic gradient descent
For binary classification,,
loss = 'binary_crossentropy'
for non-binary ones, we would use
loss = 'category_crossentropy'
Metrics, as I said, we can actually choose several metrics at the same time.
Therefore, in order to enter the values of this parameter well we have to enter them in a pair of square brackets which is supposed to be, you know the list of the different metrics with which you want to evaluate your ANN during the training, but we will only choose the main one. Here, that is accuracy
metrics = ['accuracy']
so, finally
ann.compile(optimizer = 'adam', loss = 'binary_crossentropy', metrics = ['accuracy'])
Let's run it
we want to do several prediction together . So, in batch. Classical batch size is 32
ann.fit(X, y, batch_size=?, epochs = ?)
You know, a neural network has to be trained over a certain amount of epochs so as to improve the accuracy over time. And we will clearly see that once we execute this cell.
Making the predictions and evaluating the model
Let's assume we have a new customer and now we will use this Neural Network to predict if the customer will leave or remain.
ann.predict([[2D array]])
France means 1.0,0.0, 0.0
Gender should be 1 as male
So, 2d array [[1, 0, 0, 600, 1, 40, 3, 60000, 2, 1, 1, 50000]]
As we scaled our features, we need to use those.
sc.transform use sc for scaling, transform need to be used . We don't need to fit it.
ann.predict(sc.transform([[1, 0, 0, 600, 1, 40, 3, 60000, 2, 1, 1, 50000]]))
as we used sigmoid activation, we will get probability
we get the probability 0.029
let's take a threshold 50% (.5)
if greater than 0.5, we will say true or else leave
Predicting the Test set results
y_pred=ann.predict(X_test) #ann.predict
y_pred=(y_pred>0.5) #if greater than 50% set 1 print(np.concatenate((y_pred.reshape(len(y_pred),1),y_test.reshape(len(y_test),1)),1))
Making the Confusion Matrix
from sklearn.metrics import confusion_matrix, accuracy_score
cm = confusion_matrix(y_test, y_pred)
print(cm)
accuracy_score(y_test, y_pred)
So, we got 86% prediction that folks are not leaving
Implement this code
This is another code to Build an ANN Regression model to predict the electrical energy output of a Combined Cycle Power Plant
Then complete the code