FP Growth(FP-tree) Algorithm with Example

November 1, 2017 Author: rajesh
Print Friendly, PDF & Email

FP-growth algorithm : Introduction

The FP-growth algorithm is currently one of the fastest approaches to frequent item set mining. The FP-Growth methods adopts a divide and conquer strategy as follows: compress the database representing frequent items into a frequent-pattern tree, but retain the itemset association information, and then divide such a compressed database into a set of condition databases, each associated with one frequent item, and mine each such database. First, a scan of database derives a list of frequent items in descending order. Then the FP – tree is constructed as follows. Create the root of the tree and scan the database second time. The items in each transaction are processed in the order of frequent items list and a branch is created for each transaction. When considering the branch to be added to a transaction, the count of each node along a common prefix is incremented by 1. After constructing the tree the mining proceeds as follows. Start from each frequent length-1 pattern, construct its conditional pattern base, then construct its conditional FP-tree and perform mining recursively on such a tree. The support of a candidate (conditional) itemset is counted traversing the tree. The sum of count values at least frequent item’s nodes gives the support value.




This approach is based on divide-and-conquer strategy. The first step is to compress the whole database into a frequent pattern tree that preserves the association information of itemsets. The next step is to divide this compressed database into a set of conditional databases, where each conditional database is associated with one frequent item and also these databases are mined separately. Because for each frequent item its associated data sets are needed to be examined only. This approach is beneficial as it reduces the size of the data sets to be searched.

Generating FP-Trees Pseudocode




The FP-Growth algorithmic program works as follow:

  1. Scan the transaction database once, as among the Apriori algorithmic program, to seek out all the frequent items and their Support.
  2. Sort the frequent items in descending order of their Support.
  3. Initially, begin making the FP-tree with a root “null”.
  4. Get the primary transaction from the transaction database. Takeaway all non-frequent items and list the remaining items in line with the order among the sorted frequent items.
  5. Use the transaction to construct the primary branch of the tree with each node corresponding to a frequent item and showing that item’s frequency that’s one for the primary transaction.
  6. Get the next transaction from the transaction database. Takeaway all non-frequent items and list the remaining items in line with the order among the sorted frequent items.
  7. Insert the transaction within the tree using any common prefix that may appear. Increase the item counts.
  8. Continue with Step 6 until all transactions among the database are processed.

How to create FP-Tree using Example




let there is a set of items such that:
L={A, B, C, D, E}
First, the algorithm creates the root node of the tree, with the tag “null.” Then it scans the database for the second time. Each item in a transaction is ordered by the sequence of L. Later it creates a branch for each transaction. For example, the first transaction ​\( “001:A,B,C,D,E” \)​ contains five items  ​\( {C,D,E,A,B} \)​according to the sequence of L, generating the first branch ​\( {(C:1),(D:1),(E:1),(A:1),(B:1)} \)​, for building FP-tree. The branch has five nodes. In it, C is the children link of root, D links to C, E links to D, A links to E, and B links to A. The second transaction ​\( “002:B,C,E” \)​ contains three items {C, E, B} according to the sequence of , generating a branch. In it, C links to the root, E links to C, and B links to E. This branch shares the prefix ⟨C⟩ with the existing path of transaction “001.” In this way, the algorithm makes the count of node C increase by 1 and creates two new nodes ​\( {(E:1),(B:1)} \)​ as a link ​\( (C:2) \)​ of. Generally, the algorithm considers increasing a branch for a transaction and when each node follows common prefix, its count increases by 1; algorithm creates node for the item following the prefix and linking.

 TID Items
001 A, B, C, D, E
002 B, C, E
003 C, E, D
004 A, C, D

Table 1: Transaction Database

Generating FP-Tree

Figure 1: Generating FP-Tree

For convenience of tree traversal, the algorithm creates an item header table. Each item through a node link points to itself in FP-tree. After scanning all transactions, we get the FP-tree displayed in Figure 1.

References

[1] Barış Yıldız, Belgin Ergenç. “Comparison Of Two Association Rule Mining Algorithms Without Candidate Generation”, In IASTED International Conference on Artifical Intelligence and Applications (AIA 2010), Austria, Feb 15-17, 2010

[2] J. Han, J. Pei, and Y. Yin (2000) “Mining frequent patterns without candidate generation”, In Proceeding 2000 ACM-SIGMOD International Conference Management of Data, pp. 1–12

[3] Zeng, Yi, et al. “Research of improved FP-Growth algorithm in association rules mining.” Scientific Programming 2015 (2015): 6.

[4] M.S. Mythili and A.R. Mohamed Shanavas, “Performance Evaluation of Apriori and FP-Growth Algorithms”, International Journal of Computer Applications (IJCA) Volume 79 – No10, October 2013

No Comments

Leave a Reply

Your email address will not be published. Required fields are marked *

Insert math as
Block
Inline
Additional settings
Formula color
Text color
#333333
Type math using LaTeX
Preview
\({}\)
Nothing to preview
Insert