# ID3 Decision Tree in Data Mining

August 11, 2017

## ID3 Decision Tree Overview

Engineered by Ross Quinlan the ID3 is a straightforward decision tree learning algorithm. The main concept of this algorithm is construction of the decision tree through implementing a top-down, greedy search by the provided sets for testing every attribute at each node of decision. With the aim of selecting the attribute which is most useful to classify a provided set of data, a metric is introduced named as Information Gain [1]. To acquire the finest way for classification of learning set, one requires to act for minimizing the fired question (i.e. to minimize depth of the tree). Hence, some functions are needed that is capable of determine which questions will offer the generally unbiased splitting. One such function is information gain metric.

### Entropy

In order to define information gain exactly, we require discussing entropy first. Let’s assume, without loss of simplification, that the resultant decision tree classifies instances into two categories, we’ll call them ​$$[ P_{positive} ] and [ N_{negative} ]$$

Given a set S, containing these positive and negative targets, the entropy of S related to this Boolean classification is:

$$[ P_{positive} ]$$​: proportion of positive examples in S

$$[ N_{negative} ]$$​ : proportion of negative examples in S

### Information Gain

As already discussed, for cutting down the depth of a decision tree, while traversing the same, selection of the best possible characteristic is mandatory in order to split the tree, this clearly shows that attribute with minimum drop of entropy will be the superlative pick.

Here, the information gain can be termed as required drop in entropy in relationwith individual attribute during the decision tree splitting.

The information gain, Gain(E, A) of an attribute A,

$Gain(E,A)=Entropy(s)-∑^v_{n=1} (E_v/E)* {Entropy(E_v )}$

This concept of gain can be utilized to decide positions of attributes as well as to construct decision trees in which every node is positioned the attribute with maximum gain among those attributes that are not considered in the path from the root yet.

The intention of this ordering is:

1. To generate small sized decision trees in order to identify records after only a handful decision tree splitting steps.
2. To attain the desired level of un-fussiness of the decisional approaches.

The main points of ID3 algorithm are as follows:

1. Obtain all idle attributes and calculate their entropy relating to test samples
2. Prefer that attribute Which has least entropy (or, consistently, the highest Information gain)
3. Create a node having such attributes.

#### The algorithm is as follows:

Input: Examples, Target_Attribute, Attributes

Output: Decision Tree

Process:

• Produce a node being root node of the tree
• Check if all the examples are positive, If yes then generate a single node tree, ROOT, having label = +.
• In case all the examples are found negative, then make a single node tree, ROOT, with label = -.
• If there are no attributes for prediction, then create a single node tree ROOT labelled as the most ordinary used value for that attribute.
• Else Start following procedure
• M = an attribute that is classifying the Examples in Best way.
• Make M the decision tree attribute
• Repeat for every probable value, vi, of M,
• Expand the Root with one branch, equivalent to the test M = vi.
• Let Examples(vi) be a subset of examples that have the value vi for M
• If Examples(vi) is empty
• Add a leaf node under the new branch, labelled with most ordinary value of the attribute in the examples
• Otherwise add the sub-tree ID3 (Examples(vi), Target_Attribute, Attributes – {A}), under this new branch
• End
• Return Root

#### References

[1] Rahul A. Patil, Prashant G. Ahire, Pramod. D. Patil, Avinash L. Golande, “A Modified Approach to Construct Decision Tree in Data Mining Classification”,International Journal of Engineering and Innovative Technology (IJEIT), Volume 2, Issue 1, July 2012

$${}$$