**Introduction**

ID3 decision tree algorithm is the first of a series of algorithms created by Ross Quinlan to generate decision trees. Decision Tree is one of the most powerful and popular algorithm. Decision-tree algorithm falls under the category of supervised learning algorithms. It works for both continuous as well as categorical output variables.

ID3 is a classification algorithm which for a given set of attributes and class labels, generates the model/decision tree that categorizes a given input to a specific class label \( C_k [C_1 C_2 C_3,…C_k]. \). The algorithm follows a greedy approach by selecting a best attribute that yields maximum information gain \( (IG) \)or minimum entropy \( (H). \). The algorithm then splits the data-set \( (S) \)recursively upon other unused attributes until it reaches the stop criteria (no further attributes to split). The non-terminal nodes in the decision tree represents the selected attribute upon which the split occurs and the terminal nodes represent the class labels.

### ID3 Characteristics

- ID3 does not guarantee an optimal solution; it can get stuck in local optimums
- It uses a greedy approach by selecting the best attribute to split the dataset on each iteration (one improvement that can be made on the algorithm can be to use backtracking during the search for the optimal decision tree)
- ID3 can overfit to the training data (to avoid overfitting, smaller decision trees should be preferred over larger ones)
- This algorithm usually produces small trees, but it does not always produce the smallest possible tree.
- ID3 is harder to use on continuous data (if the values of any given attribute is continuous, then there are many more places to split the data on this attribute, and searching for the best value to split by can be time consuming).

### Implementation

ID3 is implemented using Python language. In implementation we discussed function step by step.

#### Technology used

- Python 2.7
- Spyder IDE

Major steps involved in the implementation are,

- Phase 1: Entropy Calculation
- Phase 2: Attribute Selection
- Phase 3: Split the data-set
- Phase 4: Build Decision Tree

### Phase 1: Entropy Calculation

The method takes the given data-set as an argument and performs Entropy calculation over the given data-set.

\( H (S)= ∑_(x∈X)▒〖-p (x) 〖log〗_2 p(x)〗 \)

Where,

\( H –> \) entropy,

\( S –> \)data-set,

\( X –> \) set of Class labels

\( p(x) –> \) no of elements in Class x to no of elements in entire data-set S

Information gain is the measure of difference in entropy before and after the data-set split.

\( IG (A,S)=H (S)- ∑_(t∈T)▒〖p(t)H(t)〗 \)

Where,

\( H(S) –> \) entropy

\( T –> \) subset created from data-set split

\( p(t) –> \)no of elements in Class t to no of elements in entire data-set *S*

\( H(t) –> \)entropy of subset *t*

**#Entropy Calculation method**

**def calc_entropy(data):**

#Calculate the length of the data-set

entries = len(data)

labels = {}

#Read the class labels from the data-set file into the dict object “labels”

for rec in data:

label = rec[-1]

if label not in labels.keys():

labels[label] = 0

labels[label] += 1

#entropy variable is initialized to zero

entropy = 0.0

#For every class label (x) calculate the probability p(x)

for key in labels:

prob = float(labels[key])/entries

#Entropy formula calculation

entropy -= prob * log(prob,2)

#print “Entropy — “,entropy

#Return the entropy of the data-set

return entropy

### Phase 2: Attribute Selection

We have to determine the attribute based on which the data-set *(S)* is to be split. The attribute-selection method returns the best attribute with the maximum information gain *IG(S)* which is used in building the decision tree.

**#Function to determine the best attribute for the split criteria**

**def attribute_selection(data):**

#get the number of features available in the given data-set

features = len(data[0]) – 1

#Fun call to calculate the base entropy (entropy of the entire data-set)

baseEntropy = calc_entropy(data)

#initialize the info-gain variable to zero

max_InfoGain = 0.0;

bestAttr = -1

#iterate through the features identified

for i in range(features):

#store the values of the features in a variable

AttrList = [rec[i] for rec in data]

#get the unique values from the feature values

uniqueVals = set(AttrList)

#initializing the entropy and the attribute entropy to zero

newEntropy = 0.0

attrEntropy = 0.0

#iterate through the list of unique values and perform split

for value in uniqueVals:

#function call to split the data-set

newData = dataset_split(data, i, value)

#probability calculation

prob = len(newData)/float(len(data))

#entropy calculation for the attributes

newEntropy = prob * calc_entropy(newData)

attrEntropy += newEntropy

#calculation of Information Gain

infoGain = baseEntropy – attrEntropy

#identify the attribute with max info-gain

if (infoGain > max_InfoGain):

max_InfoGain = infoGain

bestAttr = i

#return the attribute identified

return bestAttr

### Phase 3: Split Data-Set

In this step, considering the attribute selected from the previous step, the axis or the arc (attribute index) and the attribute value as input a split is done on the data-set. This method is recursively called from the <> step for every attribute present in the given data-set in the order of decreasing information gain or until the algorithm reaches the stop criteria.

**#Function to split the data-set based on the attribute that has maximum information gain**

#**input values:** data-set, attribute index and attribute-value

**def dataset_split(data, arc, val):**

#declare a list variable to store the newly split data-set

newData = []

#iterate through every record in the data-set and split the data-set

for rec in data:

if rec[arc] == val:

reducedSet = list(rec[:arc])

reducedSet.extend(rec[arc+1:])

newData.append(reducedSet)

#return the new list that has the data-set that is split on the selected attribute

return newData

### Phase 4: Build Decision Tree

In this step we will integrate the above process flow into a single function and generate the rules in the decision tree.

**#Function to build the decision tree**

**def decision_tree(data, labels):**

#list variable to store the class-labels (terminal nodes of decision tree)

classList = [rec[-1] for rec in data]

if classList.count(classList[0]) == len(classList):

return classList[0]

#functional call to identify the attribute for split

maxGainNode = attribute_selection(data) #variable to store the class label value

treeLabel = labels[maxGainNode]

#dict object to represent the nodes in the decision tree

theTree = {treeLabel:{}}

del(labels[maxGainNode])

#get the unique values of the attribute identified

nodeValues = [rec[maxGainNode] for rec in data]

uniqueVals = set(nodeValues)

for value in uniqueVals:

subLabels = labels[:]

#update the non-terminal node values of the decision tree

theTree[treeLabel][value] = decision_tree(dataset_split(data, maxGainNode, value),subLabels)

#return the decision tree (dict object)

return theTree

### References

[1] “Decision tree implementation using Python”, available online at: https://www.geeksforgeeks.org/decision-tree-implementation-python/

[2] “Iterative Dichotomiser 3 (ID3) algorithm – Decision Trees – Machine Learning”, available online at: https://mariuszprzydatek.com/2014/11/11/iterative-dichotomiser-3-id3-algorithm-decision-trees-machine-learning/

[3] Yamani Reddy, “ID3 Algorithm Implementation in Python”, Machine Learning For Beginners, available online at: https://machinelearningforbeginners.wordpress.com/

## No Comments