Machine Learning : K Nearest Neighbors (KNN) Part 11

Let's assume we have 2 categories (Red & Green)

Now if we have a new data (white), where should we keep it?

Category 1 or, Category 2??

Using KNN , we basically assign that data to a category.

How to do that?

Let's practice now

Part 1: We have point between 2 categories.

Part 2: Let's take 5 neighbors near that point

In Step 3 & Step 4:

As the number of neighbors in category 1 is 3 and number of neighbors in category 2 is 2, we will assign this point to Category 1

Let's code now

Problem statement: We are launching a new SUV in the market and we want to know which age people will buy it. Here is a list of people with their age and salary. Also, we have previous data of either they did buy any SUV before or not.

Let's import the library, upload dataset and split the dataset to 75% & 25%

then feature scale

Training the KNN model

from sklearn.neighbors import KNeighborsClassifier

classifier = KNeighborsClassifier(n_neighbors = 5, metric = 'minkowski', p = 2)

We just used the default values for n_neighbors, metric and p. classifier.fit(X_train, y_train)

Predicting a new result

predicting a result for a 30year old person who earns 87000

print(classifier.predict(sc.transform([[30,87000]])))

That means, this person won't buy the SUV

Predicting the Test set results

y_pred = classifier.predict(X_test)

print(np.concatenate((y_pred.reshape(len(y_pred),1), y_test.reshape(len(y_test),1)),1))

here left 0 means the prediction was that a person did not buy SUV and right 0 means that person actually did not by the SUV.

Making the Confusion Matrix

Confusion matrix shows how much correct we were and how much we were wrong in a matrix

from sklearn.metrics import confusion_matrix, accuracy_score

cm = confusion_matrix(y_test, y_pred)

print(cm)

Checking it's accuracy

accuracy_score(y_test, y_pred)

We have 93% of accuracy which is huge!!

Visualizing the Training set results

Visualizing the Test set results

Here is the total code

Here is the dataset