Machine Learning - Support Vector Regression (SVR) - Part 5

So far in Linear regression we have seen linear lines . We could calculate the minimum error by summation of (y-y(hat))^2

You can remind it from this image

So, our target is always to minimize the error.

In SVR, we came up with a concept that, we are going to create epsilon tube which will decrease the error. How?

Here you can see we have created a insensitive tube which hides lots of data point .

As they are hidden, their error is basically 0. In this way, we have some data point outside of the tube and only for those, we will calculate the error

So, this is the calculation process

Note: If we have 0, 1 or dummy variables, we don't apply feature scaling to the column

Why is it called SVR?

It's because each point seen as vector. Every point outside the tube is support vector , dictating the formation of the tube.

Let's code:

For this blog we will again use the same data sheet we used in our last blog 4

Problem statement: We want to appoint a regional manager and in the interview, he said that he was a regional manager for 2 years at XYZ company.

We have collected the data sheet and found out the salary of a regional manager is 150k in a year. As the person worked for 2 years, the salary can be in the range between level 6 & 7. Surely, it will be less than Partner 200k.

So, we can guess it 6.5 level to get an approximate idea and offer him the salary according to experience

Imported the library and dataset

Feature Scaling

As X & y both are not categorical, we can scale them and get mean values within -3 to +3

To do that, we need StandardScaler which requires 2D array

y = y.reshape(len(y),1) #reshape(no. of rows, no. of column)

now, let's do it

from sklearn.preprocessing import StandardScaler

sc_X = StandardScaler() #StandardScaler object

sc_y = StandardScaler() #StandardScaler object

X = sc_X.fit_transform(X) #applying to X matrix for mean values

y = sc_y.fit_transform(y) #applying to y matrix for mean values

It scaled our X & y within -3 to +3

Training the SVR model

from sklearn.svm import SVR #sklearn.svm module

regressor = SVR(kernel = 'rbf') #kernel can have linear or non linear value . here 'rbf' is non linear

Kernel documentation

Know more about rbf regressor

regressor.fit(X, y) #fit the model

Predicting a new result

sc_y.inverse_transform(regressor.predict(sc_X.transform([[6.5]])).reshape(-1,1))

Explaination:

regressor.predict(sc_X.transform([[6.5]])) here, regressor.predict(scaled X matrix.transform(2D input))

Also, we did reshape the outcome y and thus the shape changed. So, we need to inverse this value to get same shape as y (used sc_y.inverse_transform)

reshape(-1,1) to avoid format error, use .reshape(-1,1)

How did we get 6.5? Check the problem statement

Visualising the SVR results

Code line : plt.scatter(sc_X.inverse_transform(X), sc_y.inverse_transform(y), color = 'red') #plt.scatter(input X, input y)

#as the X is scaled and to reverse this and get original value, use sc_X.inverse_transform(X). Same goes for y (sc_y.inverse_transform(y))

Code line: plt.plot(sc_X.inverse_transform(X),sc_y.inverse_transform(regressor.predict(X).reshape(-1,1)), color = 'blue') #plt.plot( input X, y predicted)

The prediction was sc_y.inverse_transform(regressor.predict([[6.5]]).reshape(-1,1)) for 6.5 and as we want for all values of x, it is sc_y.inverse_transform(regressor.predict(X).reshape(-1,1))

plt.title('Truth or Bluff (SVR)')

plt.xlabel('Position level')

plt.ylabel('Salary')

plt.show()

To visualize it much perfectly, you can use this grid method

Practice with the whole code

Previous Blogs:

Machine Learning - Polynomial Regression (Part 4)
Machine learning - Multiple Linear Regression Model (Part 3)

Machine learning - Linear Regression (Part 2)

Machine Learning : Data Pre Processing Part 1