Machine Learning - Evaluating Regression models (Part 8)
Let's think we have used certain amount of Nitrogen fertilizer on field and got some tons of potatoes in return.
if we present it in a graph
here yi is the real data we received and yi (hat) is the predicted data according to this linear model regression.
Again, if we don't use regression model and use average line, we get this
here yavg is the average line and yi is the real data
So, in this way, we get SSres and SStot
From this, we get R squared value
If the value is 1.0 -> the model is perfect fit
if it's near 0.9, the model is very good and so on!
Adjusted R square
But with this feature, if we add more parameter to the linear regression, our SStot remains same but SSres remains equal or less
To solve this issue, we have a new thing called Adjusted R square
Now, let's apply all of the regression algorithm in a real world dataset.
Now we are going to apply all our algorithms to the same dataset and check the R square or Adjusted R square value to see which is better
From sklearn, we will basically this page for model evaluation . Here we can see
This is what we will use for R square value
Starting with SVR
Here the R square value is defined by this
from sklearn.metrics import r2_score
r2_score(y_test, y_pred)
#y_test which we created after splitting for testing
and y_pred is what our model predicted for us.
The value of R square is is 0.94
Then with Random Forest Regression
The performance is 0.96
Then with Polynomial regression
R squared value is 0.9458
Then with Multiple Linear Regression model
The R squared value is 0.9325
Finally once we apply decision tree regression
If we evaluate, we get R squared value as 0.922
So, now decide which model is best for this dataset?
Pros and Cons of Regression model
Bye!!!