Decision Tree C 4.5(J48)

September 15, 2017 Author: virendra
Print Friendly, PDF & Email

Data mining is the useful tool to discovering the knowledge from large data. Classification methods aim to identify the classes that belong to objects from some descriptive traits. They find utility in a wide range of human activities and particularly in automated decision making. Decision trees are a very effective method of supervised learning. It aims is the partition of a dataset into groups as homogeneous as possible in terms of the variable to be predicted. It takes as input a set of classified data, and outputs a tree that resembles to an orientation diagram where each end node (leaf) is a decision (a class) and each non- final node (internal) represents a test. Each leaf represents the decision of belonging to a class of data verifying all tests path from the root to the leaf. The tree is simpler, and technically it seems easy to use. In fact, it is more interesting to get a tree that is adapted to the probabilities of variables to be tested. Mostly balanced tree will be a good result. If a sub-tree can only lead to a unique solution, then all sub-tree can be reduced to the simple conclusion, this simplifies the process and does not change the final result. Ross Quinlan worked on this kind of decision trees.


C 4.5 is a computer program for inducing classification rules in the form of decision trees from a set of given instances. C 4.5 is an algorithm developed by Ross Quinlan that generates Decision Trees (DT), which can be used for classification problems. It improves (extends) the ID3 algorithm by dealing with both continuous and discrete attributes, missing values and pruning trees after construction. Its commercial successor is C 5.0, a lot faster that C 4.5, more memory efficient and used for building smaller decision trees. Being a supervised learning algorithm, it requires a set of training examples and each example can be seen as a pair: input object and a desired output value (class). The algorithm analyzes the training set and builds a classifier that must be able to correctly classify both training and test examples. A test example is an input object and the algorithm must predict an output value (the example must be assigned to a class).

Algorithm Description

  • Select one attribute from a set of training instances
  • Select an initial subset of the training instances
  • Use the attribute and the subset of instances to build a decision tree
  • Use the rest of the training instances (those not in the subset used for construction) to test the accuracy of the constructed tree
  • If all instances are correctly classified – stop
  • If an instances is incorrectly classified, add it to the initial subset and construct a new tree
  • Iterate until
    • A tree is built that classifies all instance correctly
    • OR
    • A tree is built from the entire training set


Here is an example of a decision tree built using C 4.5:


Figure 1: Decision Tree of C 4.5

The input and output requirements and a pseudo-code for the C 4.5 algorithm are presented below:

The input for the algorithm consists of a set S of examples described by continuous or discrete attributes, each example belonging to one class.

The output is a decision tree or/and a set of rules that assigns a class to a new case (example)

The base cases are the following:

  • All the examples from the training set belong to the same class ( a tree leaf labeled with that class is returned).
  • The training set is empty (returns a tree leaf called failure).
  • The attribute list is empty (returns a leaf labeled with the most frequent class or the disjuction of all the classes).

The attribute with the highest informational gain is computed using the following formulas:

Entropy: ​\( E (S)= ∑_{(i=1)}^n- P_r (C_i )*log_2⁡ P_r (C_i) \)

Gain: ​\( G (S,A)=E(S)- ∑_{(i=1)}^m P_r (A_i )E(S_Ai) \)


\( E (S) \)​– information entropy of S

\( G (S,A) \)​ – Gain of S after a split on attribute A

\( n – nr \)​Of classes in S

\( P_r (C_i) \)​ – Frequency of class Ci in S

\( m – nr \)​ Of values of attribute A in S

\( P_r (A_i ) \)​ – Frequency of cases that have Ai value in S

\( E(S_{Ai}) \)​– Subset of S with items that have Ai value


[1] Gaurav L. Agrawal and Prof. Hitesh Gupta, “Optimization of C4.5 Decision Tree Algorithm for Data Mining Application”, International Journal of Emerging Technology and Advanced Engineering, Volume 3, Issue 3, March 2013)

[2] Badr Hssina and Abdelkarim Merbouha, “A comparative study of decision tree ID3 and C4.5”, (IJACSA) International Journal of Advanced Computer Science and Applications

[3] “Decision Trees – C4.5”, online available at:

[4] Susan Miertschin, “Decision Trees”, online available at:



Leave a Reply

Your email address will not be published. Required fields are marked *

Insert math as
Additional settings
Formula color
Text color
Type math using LaTeX
Nothing to preview