Machine Learning : SVM (Part 12)

Here we can see 2 Categories.

Category 1 has red cross and Category 2 has green cross.

Now we can create a line and and using 2 support vectors, we can create 2 more dotted lines (decision boundary) alongside the previous blue line.

Here, we can see red cross and green cross as support vectors as they are supporting to create those dot lines and those two can be calculated as a vector.

Here, we can see Maximum margin hyperplane/classifier

Anything to the right is Positive Hyperplane and anything to the left is Negative Hyperplane

Let's give an example.

Assume that we have an apple and an orange

Let's apply this to our 2 category.

in general this is a cluster of data might look like.

But in SVM, we act differently.

here, we take 2 vectors . One is an apple which almost looks like an yellow orange and one orange which almost looks like a green apple.

here the red cross (orange apple) is so near to be an orange.

again green cross is (green orange) is so near to be an apple.

Problem statement: We are launching a new SUV in the market and we want to know which age people will buy it. Here is a list of people with their age and salary. Also, we have previous data of either they did buy any SUV before or not.

Let's import the libraries , dataset and split the dataset

Feature scaling

Training the SVM model on the Training set

from sklearn.svm import SVC

importing SVC class from sklearn.svm module

classifier = SVC(kernel = 'linear', random_state = 0)

The default value for kernel is 'rbf' but we want a linear one so that, we can do it easily. or, we could choose LinearSVC module and won't need to set the kernel then.

classifier.fit(X_train, y_train) Fitting the model.

Predicting a new result

predicting for a person who has the age 30 and salary 87000

print(classifier.predict(sc.transform([[30,87000]])))

The output shows 0, which means the person won't buy.

Predicting the Test set results

Let's see our prediction and the y_test data hand in hand.

y_pred = classifier.predict(X_test)

print(np.concatenate((y_pred.reshape(len(y_pred),1), y_test.reshape(len(y_test),1)),1))

so, the left value is the predicted one and the right one is the test data.

Making the Confusion Matrix

It checks how many of the values we detected correctly and wrongly

from sklearn.metrics import confusion_matrix, accuracy_score

cm = confusion_matrix(y_test, y_pred)

print(cm)

let's check the accuracy

accuracy_score(y_test, y_pred)

we have a 90% accuracy!

Visualizing the Training set results

Visualizing the Test set results

Done!

The whole code

Dataset