Machine Learning - Evaluating Regression models (Part 8)

Let's think we have used certain amount of Nitrogen fertilizer on field and got some tons of potatoes in return.

if we present it in a graph

here yi is the real data we received and yi (hat) is the predicted data according to this linear model regression.

Again, if we don't use regression model and use average line, we get this

here yavg is the average line and yi is the real data

So, in this way, we get SSres and SStot

From this, we get R squared value

If the value is 1.0 -> the model is perfect fit

if it's near 0.9, the model is very good and so on!

Adjusted R square

But with this feature, if we add more parameter to the linear regression, our SStot remains same but SSres remains equal or less

To solve this issue, we have a new thing called Adjusted R square

Now, let's apply all of the regression algorithm in a real world dataset.

Now we are going to apply all our algorithms to the same dataset and check the R square or Adjusted R square value to see which is better

From sklearn, we will basically this page for model evaluation . Here we can see

This is what we will use for R square value

Starting with SVR

Here the R square value is defined by this

from sklearn.metrics import r2_score

r2_score(y_test, y_pred) #y_test which we created after splitting for testing

and y_pred is what our model predicted for us.

The value of R square is is 0.94

Then with Random Forest Regression

The performance is 0.96

Then with Polynomial regression

R squared value is 0.9458

Then with Multiple Linear Regression model

The R squared value is 0.9325

Finally once we apply decision tree regression

If we evaluate, we get R squared value as 0.922

So, now decide which model is best for this dataset?

Pros and Cons of Regression model

Bye!!!