Machine Learning - Regression Trees (Part 6)
Here what we can do is, we can take our independent variables in axis and keep the dependent variable on the axis towards us
Then we can split our data based on their averages.
How to do that?
Let's first split on the basis of 20
So, the decision tree would be
Then
The tree after this
Again based on 200,
the tree now
Now, based on 40
Now the tree
Now, we can get the average of the data in split places
So , the tree gets updated with these average values
Done!
Let's code now
Problem statement: We want to appoint a regional manager and in the interview, he said that he was a regional manager for 2 years at XYZ company.
We have collected the data sheet and found out the salary of a regional manager is 150k in a year. As the person worked for 2 years, the salary can be in the range between level 6 & 7. Surely, it will be less than Partner 200k.
So, we can guess it 6.5 level to get an approximate idea and offer him the salary according to experience
Here is the dataset
Note: Decision tree algorithms are not well fit for 1 independent variable which we have for this dataset. Its better for more independent variables.
Let's code
Firstly, import the dataset
Note: While encoding: If order matters, we use LabelEncoder and if not, we use ColumnTransformer. Also, we need no feature scaling in decision tree algorithm
Training the Decision Tree Regression model on the whole dataset
from sklearn.tree import DecisionTreeRegressor
#importing the class
regressor = DecisionTreeRegressor(random_state = 0)
#object setting random state as random thing happens and we need to set a default seed value
regressor.fit
(X, y)
#fit the model
Predicting a new result
regressor.predict([[6.5]])
#predict needs 2d array
We are using 6.5 as we want to get the 2nd year salary for level 6 officer. We are assuming 6.5 should give us some value
But the output shows 150k which is less than level 6 and is a bad prediction.
We did previously mention why decision tree model will not be best for this scene as we just have 1 independent matrix (X) . If we would have a lot of independent matrix, we could get a better result
Visually checking the graph
Check out the code
Previous blogs on this series
Machine Learning - Support Vector Regression (SVR) - Part 5
Machine Learning - Polynomial Regression (Part 4)
Machine learning - Multiple Linear Regression Model (Part 3)