## ID3 Decision Tree Overview

Engineered by Ross Quinlan the ID3 is a straightforward decision tree learning algorithm. The main concept of this algorithm is construction of the decision tree through implementing a top-down, greedy search by the provided sets for testing every attribute at each node of decision. With the aim of selecting the attribute which is most useful to classify a provided set of data, a metric is introduced named as Information Gain [1]. To acquire the finest way for classification of learning set, one requires to act for minimizing the fired question (i.e. to minimize depth of the tree). Hence, some functions are needed that is capable of determine which questions will offer the generally unbiased splitting. One such function is information gain metric.

### Entropy

In order to define information gain exactly, we require discussing entropy first. Let’s assume, without loss of simplification, that the resultant decision tree classifies instances into two categories, we’ll call them \( [ P_{positive} ] and [ N_{negative} ] \)

Given a set S, containing these positive and negative targets, the entropy of S related to this Boolean classification is:

\( [ P_{positive} ] \): proportion of positive examples in S

\( [ N_{negative} ] \) : proportion of negative examples in S

### Information Gain

As already discussed, for cutting down the depth of a decision tree, while traversing the same, selection of the best possible characteristic is mandatory in order to split the tree, this clearly shows that attribute with minimum drop of entropy will be the superlative pick.

Here, the information gain can be termed as required drop in entropy in relationwith individual attribute during the decision tree splitting.

The information gain, Gain(E, A) of an attribute A,

\[ Gain(E,A)=Entropy(s)-∑^v_{n=1} (E_v/E)* {Entropy(E_v )} \]

This concept of gain can be utilized to decide positions of attributes as well as to construct decision trees in which every node is positioned the attribute with maximum gain among those attributes that are not considered in the path from the root yet.

The intention of this ordering is:

- To generate small sized decision trees in order to identify records after only a handful decision tree splitting steps.
- To attain the desired level of un-fussiness of the decisional approaches.

The main points of ID3 algorithm are as follows:

- Obtain all idle attributes and calculate their entropy relating to test samples
- Prefer that attribute Which has least entropy (or, consistently, the highest Information gain)
- Create a node having such attributes.

#### The algorithm is as follows:

Input: Examples, Target_Attribute, Attributes

Output: Decision Tree

Process:

- Produce a node being root node of the tree
- Check if all the examples are positive, If yes then generate a single node tree, ROOT, having label = +.
- In case all the examples are found negative, then make a single node tree, ROOT, with label = -.
- If there are no attributes for prediction, then create a single node tree ROOT labelled as the most ordinary used value for that attribute.
- Else Start following procedure
- M = an attribute that is classifying the Examples in Best way.
- Make M the decision tree attribute
- Repeat for every probable value,
*v*, of M,_{i}- Expand the Root with one branch, equivalent to the test M =
*v*._{i} - Let Examples(
*v*) be a subset of examples that have the value_{i}*v*for M_{i} - If Examples(
*v*) is empty_{i}- Add a leaf node under the new branch, labelled with most ordinary value of the attribute in the examples

- Otherwise add the sub-tree ID3 (Examples(
*v*), Target_Attribute, Attributes – {A}), under this new branch_{i}

- Expand the Root with one branch, equivalent to the test M =
- End

- Return Root

#### References

[1] Rahul A. Patil, Prashant G. Ahire, Pramod. D. Patil, Avinash L. Golande, “A Modified Approach to Construct Decision Tree in Data Mining Classification”,International Journal of Engineering and Innovative Technology (IJEIT), Volume 2, Issue 1, July 2012

## 13 Comments

Thank You for this.

I really enjoy examining on this internet site , it has got great posts .

Enjoyed reading through this, very good stuff, thankyou .

Decision Trees Applications

I must say got into this site. I found it to be interesting and loaded with unique points of view.

Found this on yahoo and I’m happy I did. Well written website.

Cheers, great stuff, Me like.

very nice post, i actually enjoyed this web site, carry on it

This is interesting!

This does interest me

I was looking at some of your articles on this site and I believe this internet site is really instructive! Keep on posting .

I must say got into this website. I found it to be interesting and loaded with unique points of view.

C4.5/J48 decision tree algorithm in data mining