Machine Learning - Support Vector Regression (SVR) - Part 5
So far in Linear regression we have seen linear lines . We could calculate the minimum error by summation of (y-y(hat))^2
You can remind it from this image
So, our target is always to minimize the error.
In SVR, we came up with a concept that, we are going to create epsilon tube which will decrease the error. How?
Here you can see we have created a insensitive tube which hides lots of data point .
As they are hidden, their error is basically 0. In this way, we have some data point outside of the tube and only for those, we will calculate the error
So, this is the calculation process
Note: If we have 0, 1 or dummy variables, we don't apply feature scaling to the column
Why is it called SVR?
It's because each point seen as vector. Every point outside the tube is support vector , dictating the formation of the tube.
Let's code:
For this blog we will again use the same data sheet we used in our last blog 4
Problem statement: We want to appoint a regional manager and in the interview, he said that he was a regional manager for 2 years at XYZ company.
We have collected the data sheet and found out the salary of a regional manager is 150k in a year. As the person worked for 2 years, the salary can be in the range between level 6 & 7. Surely, it will be less than Partner 200k.
So, we can guess it 6.5 level to get an approximate idea and offer him the salary according to experience
Imported the library and dataset
Feature Scaling
As X & y both are not categorical, we can scale them and get mean values within -3 to +3
To do that, we need StandardScaler which requires 2D array
y = y.reshape(len(y),1)
#reshape(no. of rows, no. of column)
now, let's do it
from sklearn.preprocessing import StandardScaler
sc_X = StandardScaler()
#StandardScaler object
sc_y = StandardScaler(
) #StandardScaler object
X = sc_
X.fit
_transform(X)
#applying to X matrix for mean values
y = sc_
y.fit
_transform(y)
#applying to y matrix for mean values
It scaled our X & y within -3 to +3
Training the SVR model
from sklearn.svm import SVR
#sklearn.svm module
regressor = SVR(kernel = 'rbf')
#kernel can have linear or non linear value . here 'rbf' is non linear
Kernel documentation
regressor.fit
(X, y)
#fit the model
Predicting a new result
sc_y.inverse_transform(regressor.predict(sc_X.transform([[6.5]])).reshape(-1,1))
Explaination:
regressor.predict(sc_X.transform([[6.5]]))
here, regressor.predict(scaled X matrix.transform(2D input))
Also, we did reshape the outcome y and thus the shape changed. So, we need to inverse this value to get same shape as y (used sc_y.inverse_transform
)
reshape(-1,1)
to avoid format error, use .reshape(-1,1)
How did we get 6.5? Check the problem statement
Visualising the SVR results
Code line : plt.scatter(sc_X.inverse_transform(X), sc_y.inverse_transform(y), color = 'red')
#plt.scatter(input X, input y)
#as the X is scaled and to reverse this and get original value, use sc_X.inverse_transform(X). Same goes for y (sc_y.inverse_transform(y))
Code line: plt.plot(sc_X.inverse_transform(X),sc_y.inverse_transform(regressor.predict(X).reshape(-1,1)), color = 'blue')
#plt.plot( input X, y predicted)
The prediction was sc_y.inverse_transform(regressor.predict([[6.5]]).reshape(-1,1)) for 6.5 and as we want for all values of x, it is sc_y.inverse_transform(regressor.predict(X).reshape(-1,1))
plt.title('Truth or Bluff (SVR)')
plt.xlabel('Position level')
plt.ylabel('Salary')
plt.show
()
To visualize it much perfectly, you can use this grid method
Practice with the whole code
Previous Blogs:
Machine Learning - Polynomial Regression (Part 4)
Machine learning - Multiple Linear Regression Model (Part 3)