Machine Learning - Polynomial Regression (Part 4)

This is the mathematical equation for polynomial regression.

Now, let's see where we use it

if we use linear regression here, it basically does not event fit.

but if we use polynomial regression, it basically fits all of the data.

Let's learn it with coding

Problem statement: We want to appoint a regional manager and in the interview, he said that he was a regional manager for 2 years at XYZ company.

We have collected the data sheet and found out the salary of a regional manager is 150k in a year. As the person worked for 2 years, the salary can be in the range between level 6 & 7. Surely, it will be less than Partner 200k.

So, we can guess it 6.5 level to get an approximate idea and offer him the salary according to experience

Here is the dataset

So, let's do the data processing

import numpy as np

import matplotlib.pyplot as plt

import pandas as pd

Here, we just want level column as X and Salary as y

dataset =pd.read_csv('Position_Salaries.csv')

X = dataset.iloc[:, 1:-1].values

y = dataset.iloc[:, -1].values

Training the Linear Regression model on the whole dataset

from sklearn.linear_model import LinearRegression #imported LinearRegression class

lin_regressor= LinearRegression() #importing the object

lin_regressor.fit(X,y) #we are using the whole data and not splitting it

We have used all of our data to fit the model in linear model

Training the Polynomial Regression model on the whole dataset

from sklearn.preprocessing import PolynomialFeatures #we are importing PolynomialFeatures

poly_regressor=PolynomialFeatures(degree=2) #training this for degree 2 that means x1 & x1^2

X_poly=poly_regressor.fit_transform(X) #the matrix of feature we want to transform

#now building y=b0+b1x1+b1x1^2

lin_regressor2=LinearRegression()

lin_regressor2.fit(X_poly,y) #creating the new linear regression model using the X_poly

Visualizing the Linear Regression results

plt.scatter(X,y,color='red') #for real data to keep in the 2d surface

plt.plot(X, lin_regressor.predict(X),color='blue') #plot(X coordinates, y=lin_regressor.predict(X) would be predicated salaries)

plt.title('Linear regression model')

plt.xlabel('Position level')

plt.ylabel('Salary')

plt.show()

Surely, the model is not fitting the data and let's try with polynomial one then.

Visualizing the Polynomial Regression results

plt.scatter(X,y,color='red') #for real data to keep in the 2d surface

plt.plot(X, lin_regressor2.predict(X_poly)) #plot(X coordinates, y=lin_regressor2.predict(X_poly) would be predicated salaries)

plt.title('Polynomial regression model')

plt.xlabel('Position level')

plt.ylabel('Salary')

plt.show()

As we can see the model is not fitting that properly. So, we can increase the degree and check it

Practice with much higher degree

from sklearn.preprocessing import PolynomialFeatures

#now we are targetting x1,x1^2,x1^2,x1^4,x1^5,x1^6

poly_regressor=PolynomialFeatures(degree=6)

X_poly=poly_regressor.fit_transform(X) #the matrix of feature we want to transform#now building y=b0+b1x1+b1x1^2+b2x1^3+b3x1^4+b4x1^5+b5x1^6

lin_regressor2=LinearRegression()

lin_regressor2.fit(X_poly,y) #creating the new linear regression model using the X_poly

#this plot fits the data when the degree is 6

plt.scatter(X,y,color='red') #for real data to keep in the 2d surface

plt.plot(X, lin_regressor2.predict(X_poly)) #plot(X coordinates, y=lin_regressor2.predict(X_poly) would be predicated salaries)

plt.title('Polynomial regression model')

plt.xlabel('Position level')

plt.ylabel('Salary')

plt.show()

Optional: Visualizing the Polynomial Regression results (for higher resolution and smoother curve)

X_grid=np.arange(min(X),max(X),0.1) #instead of taking integers value from X, we are taking 0.1, 0.2, 0.3 etc in smaller portion to make it look better

X_grid=X_grid.reshape((len(X_grid),1))

plt.scatter(X,y,color='red')

plt.plot(X_grid,lin_regressor2.predict(poly_regressor.fit_transform(X_grid)),color='blue')

plt.title('Polynomial regression')

plt.xlabel('Position level')

plt.ylabel('Salary')

plt.show()

Let's solve our problem now:

Predicting a new result with Linear Regression

lin_regressor.predict([[6.5]]) #we want to see the salary of regional managed at year 2. So, it should be in between level 6 & 7. As it's not more than level 7

#we will use array here

#the result is misleading as it crosses salary of level 7

330k is impossible as the level 7 role had 200k as salary

Let's solve our problem now: Predicting a new result with Polynomial Regression


#for polynomial , we need to provide x1, x1^2,x1^3,x1^4,x1^5,x1^6

lin_regressor2.predict(poly_regressor.fit_transform([[6.5]]))

Now it seems relevant as it's 170k which is in between 150k and 200k.

So, the result is in between Level 6 & 7

We can offer this salary to our regional manager!

Try the whole code

Previous Blogs

Machine learning - Multiple Linear Regression Model (Part 3)

Machine learning - Linear Regression (Part 2)

Machine Learning : Data Pre Processing Part 1