Machine Learning - Regression Trees (Part 6)

Here what we can do is, we can take our independent variables in axis and keep the dependent variable on the axis towards us

Then we can split our data based on their averages.

How to do that?

Let's first split on the basis of 20

So, the decision tree would be

Then

The tree after this

Again based on 200,

the tree now

Now, based on 40

Now the tree

Now, we can get the average of the data in split places

So , the tree gets updated with these average values

Done!

Let's code now

Problem statement: We want to appoint a regional manager and in the interview, he said that he was a regional manager for 2 years at XYZ company.

We have collected the data sheet and found out the salary of a regional manager is 150k in a year. As the person worked for 2 years, the salary can be in the range between level 6 & 7. Surely, it will be less than Partner 200k.

So, we can guess it 6.5 level to get an approximate idea and offer him the salary according to experience

Here is the dataset

Note: Decision tree algorithms are not well fit for 1 independent variable which we have for this dataset. Its better for more independent variables.

Let's code

Firstly, import the dataset

Note: While encoding: If order matters, we use LabelEncoder and if not, we use ColumnTransformer. Also, we need no feature scaling in decision tree algorithm

Training the Decision Tree Regression model on the whole dataset

from sklearn.tree import DecisionTreeRegressor #importing the class

regressor = DecisionTreeRegressor(random_state = 0) #object setting random state as random thing happens and we need to set a default seed value

regressor.fit(X, y) #fit the model

Predicting a new result

regressor.predict([[6.5]]) #predict needs 2d array

We are using 6.5 as we want to get the 2nd year salary for level 6 officer. We are assuming 6.5 should give us some value

But the output shows 150k which is less than level 6 and is a bad prediction.

We did previously mention why decision tree model will not be best for this scene as we just have 1 independent matrix (X) . If we would have a lot of independent matrix, we could get a better result

Visually checking the graph

Check out the code

Previous blogs on this series

Machine Learning - Support Vector Regression (SVR) - Part 5

Machine Learning - Polynomial Regression (Part 4)
Machine learning - Multiple Linear Regression Model (Part 3)

Machine learning - Linear Regression (Part 2)

Machine Learning : Data Pre Processing Part 1