Machine Learning : Deep Learning - Self Organizing Maps (SOM) (Part 29)
How do SOM work?
Previously we learned about ANN, CNN, RNN which are part of Supervised DL. Let's now start our unsupervised Deep Learning Learning.
here, they've got a beautiful visualization of how self-organizing maps actually work. They take a multi-dimensional data set which might have lots and lots and lot columns which are the dimensions of the data set and of course lots and lots of rows and they reduce the dimensionality of these data sets.So basically, instead of having 20, 30 or a hundred or even more columns, you end up with a map,and that's what they call a self-organizing map so you end up with a two-dimensional representation of your data set.
The purpose or point of the SOM is to reducethe amount of columns.So reduce the amount of columns from many, many, many columns into just the two-dimensions that you have on a two-dimensional output.
Let's see an example:
So this is an actual self-organizing map which was produced, and what it is depicting is actually the different states of prosperity and poverty in different countries. So here you actually have countries of the world and this self-organizing map has put them into clusters based on lots of different indicators.
In fact in this specific example, 39 different indicators were used with your 200 plus countries as your rows and 39 different columns. So it's impossible to visualize that on its ownbut then if you use a SOM,you can reduce the dimensionality and present it as a map like this.
This algorithm is different to everything else we've seen before. It has training data but it doesn't have any labels in the training data,so it's not like a convolutional neural network where it actually can learn what it's seeing and then apply that to the test.In fact this algorithm is unsupervised so it's basically just given data, data, data and then it learns to group these countries.
using SOM, if we plot that in world map you can see here,you have a beautiful world map which shows you exactly those things.So least alarming state of poverty over here, here and here.And then you have countries of the most alarming.And as you can see based on what we know,based on the general knowledge we have about first world countries and third world countries, and where countries are developed, where countries are still developing.
So, this is what happened
We took the dataset and applied SOM and then plotted that in world map.
Kmeans Clustering
Check this out
How do SOMs learn?
Here we've got a very simple example of a self-organizing map. We've got three features in our input vectors, and we've got nine nodes in the output.
And here you might be wondering, how self-organizing maps are used to reduce the dimensionality of your data set?
Here we have three features or three columns in our data set, so therefore, we might have thousands and thousands and thousands of rows, each of which has three columns as input
And that means that our input data set is actually three dimensional, whereas our output data set in a self-organizing map is always a two-dimensional map, and therefore we are reducing the dimensionality from 3D to 2D
So let's turn it around. This is what it would look like.
it's exactly the same network, the only difference is how we've positioned the nodes. We still have the same amount of connections, same amount of inputs, same amount of outputs,it's just the visual representation has changed simply because we're used to this and it's easier for us to understand what's going on like this a bit better.
First thing that we're going to look at is the top node, the top node in our outputs.And we're specifically going to look at the three connections, there are three synapses leading to this node.In fact, let's gray out the rest of the synapses,so that we know that we're focusing on this specific combination, or these specific three.
And each one of them, just as previously, will have a weight assigned to it. So here we've got W one one, one two, and one three.
And the first index means that it's the first node in our output nodes, and the second index means where that synapse is connecting from.
For W1,3 means this goes to first output and again comes from 3rd line.
Note: And the important thing for us to mention here is that weights in self-organizing maps are different, have a whole different connotation to them as opposed to what we saw in artificial neural networks.
In artificial neural networks, weights were used to multiply, so we multiply the input of this node,
Well, in self-organizing maps, there is no activation function. Weights are a characteristic of the node itself. And that's what we're representing over here.
This node (Node 1) is actually also trying to be a, like a ghost, a type of a ghost in our input space. It's trying to see where it can fit in our input space, and that's exactly what's going on. So these weights are the coordinates of this node in our input space.
So these weights are the coordinates of this node in our input space.
So there we go, that's node number one. Same thing we can do for node number two.Same thing for node number three. Same thing for node number four, and so on.So, each one of the nodes, in our case nine.
at the start of the algorithm, as usually weights are assigned at random to values close to zero but not zero.And therefore each one of these nodes (Node1 to Node9) has its own imaginary place in the input space.
Well, this is the core of the self-organizing map algorithm. Now we're going to have a competition. Among these nodes (Node1 to Node9), we're going to go through each of our rows of our data set, and we're going to find out which of these nodes is closest to each of our rows in our data set.
So let's go ahead and imagine that we've inputted row number one of our data set into our input nodes.So we've put in column one, column two,column three, the values of row number one.( Just for an example, I have provided some value in row 1 which has 3 column X1,X2,X3)
And now we're going to go through every single one of these nodes,and find out which of these is the closest in that original input space, which of these nodes is closest to our row number one. The way we find that is:
So it's calculated as X one minus W one one squared, plus X two minus W one two squared, plus X three minus W one three squared, and the square root out of all of that.
And let's say we get a value of 1.2. And by the way you should get a values close to one here, because you should make sure that your inputs are between zero and one for all of this algorithm to work properly. So as we discussed previously, normalization or standardization,you've got to apply those things before you actually input the data into the self-organizing map.
So that's the distance between node number one and row number one of our data set.
So we calculated the point that row number one (Row 1) represents in our input space, to each one these nodes (Node 1 to Node9) in our self-organizing map.
And we found that the closest one out of all of them them is node number three (Node 3) .And we're going to call node number three BMU, or the best matching unit.So that is the core of the algorithm. And now we want to find out what happens next!
So for that let's look at a larger self-organizing map. We have more output nodes here in this example.
And let's say in this larger map we found the best matching unit (BMU) for row number one (Row 1).
So what's going to happen next is the self-organizing map is actually going to update the weights,for this best matching unit, so that it is (This output Node) actually even closer to our first row in our data set.
And the reason we are updating the weights is because we simply don't have control of our inputs, we cannot our data set, so the only thing that we can control in that formula are the weights of this node in order for it to become closer.
And in simple terms what that means, or in visual terms what that means, is the self-organizing map is coming closer to that data point.
So that's exactly what's going on, and that's why it's called a self-organizing map.It self-organizes onto your input data.
And, by the way, as you can see here, what's happening is not just this one point is being dragged closer, but also some of the close, some of the nearby points are being dragged closer to this point.And that's exactly what we're going to look at next.
The next step that we have is a whole radius around this best matching unit, and every single point, every single node of our self-organizing map that falls inside that radius is going to have its weight updated to come closer to that row that we matched up with.So there you go, they all got their weights updated.
And the way it works is the closer you are to the BMU, the heavier are your weights updated. So these weights are going to be updated the most,
these weights are going to be updated less,
these weights are going to be updated even less.
Let's say row number two had its best matching unit somewhere else, for instance over there. That's the best matching unit for row number two.
Well what happens here is again that row, that BMU is updated to be closer, and it has its own radius, so everything with the radius is also updated to be closer to that row that we matched up with.
And so the question here is how do they combat each other, how do they fight with each other (Blue and green point)?
Well it's pretty simple. So let's have a look at one point.
And as you can see it's quite far away from the green BMU, it's quite close to the blue BMU, in fact it might be so far away from the green BMU that it doesn't even fall within its radius
So what happens here is that it is pulled much harder with the blue BMU, and therefore it becomes like the blue BMU. So it becomes closer, and we're going to color it in blue
Then let's have a look at this one, same thing here,
So this one is still far away from the green one, but it's also quite far away from the blue one. In fact, it's just a bit closer to the blue one than the green one, so when we pull on it it will be updated, so in this case, we'll color it in kind of like a greenish-blue.
And then this one
This one is actually closer to the green BMU than to the blue BMU, and therefore when we pull on the green and then we pull on the blue, they'll be a bit of a struggle but overall it will move closer to the green one than to the blue one.
And then finally, here is another one
the green is going to have a much stronger impact, and therefore we are going to color it in green.
So there we go, that's us just looking at four random nodes in our self-organizing map, and hopefully that demonstrate show this map self-organizes itself onto your data points in the input.
Now we're going to have a look at a bit of a more sophisticated example where we have several best-matching units. For instance five, in this case
So as we discussed, these BMUs are going to be updated to be even closer to the rows that they matched up to, and then each one of these BMUs is going to be assigned a area around it, and that area is going to be calculated through a radius and this is what it looks like.
So you go through all of your rows and your dataset and all of these updates happen. And then, as you have a new epoch or so you go through your rows again what happens is a unique feature of the Kohonen Learning Algorithm is applied and what that is is that your radiuses actually shrink.
Here in the image, the radius has shrunked a bit.when you're going through your dataset, your BMUs they're actually pulling, not on as many nodes in your SOM as before. (Meaning as we have reduced the radius, we have less nodes within that area compared to before)
And then, again, the radiuses shrink. So the process becomes more and more accurate as you go through your dataset again and again and again
what happens is, you are adjusting your SOM in a more precise and a more laser-specific manner you're adjusting, okay little edge over here little bump over here.
So you're becoming more and more specific and targeted in your approach. And that's exactly how your SOM slowly but surely goes over your data and becomes kind of like a mosque for your data.
And so, in our visual representation what that would look like is... Here is our data, and then through lots and lots of iterations, our SOM might look something like this
Read this
Let's code this down
Problem statement: We have a csv file and we will try to detect frauds.
Here each row represents a customer information and columns do have multiple information.
#Importing libraries
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
#Importing the dataset
dataset=
pd.read
_csv('Credit_Card_Applications.csv')
Here all of these fields will act as input. Because depending on that, we will try to predict if a person should be given a credit card or not.
So, X= dataset.iloc[:,:-1]
to take all of the columns except the last one and surely all rows included .values
will be added to make them a numpy array.
X= dataset.iloc[:,:-1].values
Only this column is the real dataset which shows how many got approved (1) and how many got rejected (0)
So, y will be this
y=dataset.iloc[:,-1].values
Feature Scaling
It's a must as there are high dimensionality and we need to reduce that.
from sklearn.preprocessing import MinMaxScaler
#Importing MinMaxSclaer class
Now, the MinMaxScaler object.
sc = MinMaxScaler()
within that, we will have feature_range
We want our values should be within 0 to 1
sc = MinMaxScaler(feature_range=(0,1))
Now, we will execute that using fit_transform function
X=
sc.fit
_transform(X)
Now, you can see all of the data will be within 0 and 1
We will be using minisom file
We have now imported MiniSom x contains, remember 14 attributes plus the customer ID. We don't really need to consider the customer ID because it has of course no significance on the patterns,but we're gonna keep it because thenwe want to identify the potential cheaters and for this we will need the customer ID.So input length will be equal to 14 plus one equals 15.
Then sigma, so remember from the intuition lecture that sigma is the radius of the different neighborhoods in the grid.
So we will keep its default value 1.0 and then we have the learning rate,which remember is this hyper-parameter that decides by how much the weights are updated during each iteration. So the higher is the learning rate,the faster there will be convergence and the lower is the learning rate, the longer the self-organizing map will take time to be built. And so same here, we're gonna keep it to default value 0.5 and then finally we have a decay function parameter that can be used to improve the convergence. But here we're going to leave it to none and not use a decay. This will be perfectly fine with these values of the hyperparameters, sigma and learning rate.
And therefore, let's input the parameters. So we said x equals 10, y equals 10 so we have a 10 by 10 grid.
som= MiniSom(x=10,y=10)
After updating everything we got this
som= MiniSom(x=10,y=10, input_len=15,sigma=1.0,learning_rate=0.5)
So that's the method that will initialize the weights. And inside this method, we just need to input the data. That is x, our data on which the model will be trained.
som.random_weights_init(X)
This is what we are following
The last line is about using another method, which is of course the method to train the self-organizing map on x. And this method is called Train Random. And of course, this is the method that will apply all the different steps here from step four to step nine and of course by repeating steps four to nine for many iterations.
som.train_random(data=X,num_iteration=100)
And now let's select this line and execute to train the self-organizing map on x.
Clearly there's two dimensional grid that will contain all the final winning nodes and for each of these winning nodes we will get what is most important for us that is, the MID; the Mean Interneuron Distance.
And I remind that the MID of a specific winning node is the mean of the distances of all the neurons around the winning node inside a neighborhood that we defined. So basically the higher is the MID , then the more the winning node will be far away from it's neighbors.
And therefore the higher is the MID, the more the winning node is an outlier. And since in some way the majority of the winning nodes representthe rules that are respected. Well on that line you're far from this majority of neurons, is therefore far from the general rules. And that is exactly how we will detect the outliers that is this the frauds.
Because for each neuron we will get the MID, so we will simply need to take the winning nodes that have the highest MID. And, we're not going to look at figures, we're not going to compare figures, we're going to use colors. That means the winning notes will be colored by different colors in such a way that the larger is the MID, the closer to white the color will be.
And so now lets start building this map.
from pylab import bone, pcolor, colorbar, plot, show
So first we need to initialize the figure, that this is the window that will contain the map. And to do this we use the bone function
bone()
First, what we are gonna do is put the different winning nodes on the map.
Well in fact we are going to add on the map the information of the Mean Interneuron Distance for all the winning nodes that the self-organizing map identified.
And we're not going to add the figures of all these Mean Interneuron Distances instead we will use colors. Different colors corresponding to the different range values of the Mean Interneuron Distances. And to do this we are going to use the pcolor function, and inside this pcolor function we're going to add all the values of the Mean Interneuron Distancesfor all the winning nodes of our self-organizing map.And to get these mean distances, well we have a specific method for that, it's Distance Map Method.
pcolor(som.distance_map())
distance_map
Method will return all the Mean Interneuron Distances in one matrix.
And now, just to get things in the right order for the pcolor function, we just need to take the transpose of this matrix returned by the Distance Map Method.
pcolor(som.distance_map().T)
Now if we select the last 2 lines and run it . In the plot, we can see this
well we get the self-organizing map with all the different colors corresponding to the Mean Interneuron Distances. but now we would like to add one more information. We would like to see if the white color corresponds to a high MID or a low MID. And same for the dark colors, do they correspond to the lowest MIDs or the highest MIDs?
Well to know this for sure it's good to add a legend and this is what we're going to do right now
colorbar()
Now, once we execute this one, we get
Here it is
So this is the range of values of the MID, the Mean Interneuron Distances. But these are normalized values, that means that the values were scaled from zero to one.
And therefore now we can clearly see that the highest MIDs, the highest Mean Interneuron Distances, correspond to the white color.
And on the other hand, the smallest Mean Interneuron Distances correspond to the dark colors.
You see all these majority of points here with dark colors are close to each other
because their MID is pretty low. So that means that all the winning nodes in the neighborhood of one winning node are close to this winning node at the center and therefore that creates clusters of winning nodes all close to each other.
But, these winning nodes here have large MIDsand therefore they're outliers and accordingly potential frauds. So great we actually identified the outlier, the fraud
But we can do better with this map, because we can actually add some markers. To make the distinction between the customers who got approval and the customers who didn't get approval. Because the customers who cheated and got approval are more relevant targets to fraud detection than the customers who didn't get approval and cheated.
we're going to create two markers, some red circles and some green squares. The red circles are going to correspond to the customers who didn't get approval.And the green squares will correspond to the customers who got approval.
markers=['o','s']
So, we're going to add a new variable here that will contain our colors,
colors=['r','g']
red color, coded by r and the green color, coded by g. So far no association is made between the markers and the colors but we will make them afterwards
this is what we're going to do now we're going to loop over all the customers and for each customer we're going to get the winning node and dependent on whether the customer got approval or not, we're going to color this winning node by a red circle if the customer didn't get approval and a green square if the customer got approval.
i is just going be the different values of all the indexes of our customer database, that is you know i is going to take the values, zero, one and two and three down to 689.
x is going to be different vectors of customers that corresponds to the first customer
then at the next iteration x will be equal to this second vector, that corresponds to the second customer
and down to the last customer
for i,x in enumerate(X):
To get the winning node of the customer x.
w=som.winner(x)
Now, what we want to do is for this winning node place the marker on it, the colored marker on it. So that's what we are going to do right now On this winning node we're going to plot the marker.
So what we're going to get now is this information, whether this customer got approval or not. So in this plot function the first thing that we have to do is to specify the coordinates of this marker. And we would like to put this marker at the center or the winning node.
You know, each winning node is represented by a square in the self-organizing map that we just saw earlier and basically we want to put the marker at the center of the square.
plot(w[0],
w[1])
But this is actually the coordinate of the lower left corner of the square. And we want to put it at the center of the square
And we want to put it at the center of the square, and therefore we will add plus 0.5 here to put it in the middle of the horizontal base of the square and also plus 0.5 here to put it at the center of the square.
And now what we have to do is the most important thing we need to know whether we have to put a red circle or a green square, and how do we do that?
we still had this y vector here that contains the information of whether the customer got approval or not. So this is how we're going to know whether the markers going to be a red circle or a green square.
Because i is the index of the customer, so y[i] is the value of the dependent variable for that customer that is 0 if the customer didn't get approval and 1 if the customer got approval.
we're going to give this color to the marker however in the markers we can color the inside of the marker and the edge of the marker. And actually we're going to only color the edge so what we're going to do is,
add here marker edge color that will be equal to colors y[i].
But then as for the inside of the marker we will not color it, because we can have two markers for the same winning node. And if that's the case we will see much better the two markers if there's no color inside. You'll be convinced when we plot the map so, marker face color that's the code to get the inside color of the marker, so right now we're adding none.
Then we add the marker size and width
Now we select this and run it. Then in the plot we get this image
Now not only we have the Mean Interneuron Distances but also we get the information of whether the customers got approval or didn't get approval for each of the winning nodes.
Let's analyze
Here white color means it has high chance of fraud and then red circle means some did not get approved and green square means some got approved.
So, he are high chance of cheaters .
Again, there are colours which are near black and got approved (green square)
Meaning these customers are good
So, now the map is done, we have a good map and we're going to use this map to catch these potential cheaters
Finding the frauds
The first thing we need to do is to get these mappings, so we're gonna introduce a new variable that we're gonna call mappings
mappings=
som.win
_map()
for example, this corresponds to the lower left winning note in the self-organizing map because it has coordinate 0,0 and then, for this particular winning note we get the list of all the customers associated to this winning note.
Okay, so now that we understand how mapping works, that is, as a dictionary, we will use it to get the frauds.
Here is the map, and so remember the outlying winning notes are this winning note and this winning note. So, let's start with one winning note and let's get the customers for this winning note and then we will use the concatenate function to get the customers of this winning note plus the customers of this winning note to get the whole list of customers most likely to have cheated to get the credit card.
Let's manually check those
this is at position (6,1)
Then we can get the results
Note: One thing to keep in mind, due to different legends (from 1.0 to 0.3), every time it changed and we can see different SOM.
Done!
Complete the coding