free web hit counter

FP Growth(FP-tree) Algorithm with Example

November 1, 2017
Print Friendly, PDF & Email

FP-growth algorithm : Introduction

The FP-growth algorithm is one of the fastest approaches to frequent itemset mining. The FP-Growth Algorithm adopts “Divide and Conquer” strategy to mine the patterns. The algorithm compresses the database, frequent items are transformed into a frequent-pattern tree. the algorithm preserves the association information available in Dataset. The database is compressed into a set of rules that represent entire databases. Each rule shows conditional decisions based on itemset.
First, The database is scanned to find a list of frequent items, in descending order. Then the FP-tree is constructed as.
Create the root node of the tree and scan the database, second time. The items in each transaction are evaluated, with respect to the frequent items list. Using, the mutual frequency of items, these calculated items help to create a branch node. When considering the branch to be added in FP-Tree, the count of each node with a common prefix is preferred.

.




the mining of frequent patterns and trees is also known as association rule mining. therefore how two items come together in a transaction, is the key concept of an association rule mining. this algorithm helps to measure the frequency of association of two items in a set of transactions.  the transactions T_i are a set of items I_j. such that  The first step is to compress the whole database into a frequent pattern tree that preserves the association information of itemsets. The next step is to divide this compressed database into a set of conditional databases, where each conditional database is associated with one frequent item and also these databases are mined separately. Because for each frequent item its associated data sets are needed to be examined only. This approach is beneficial as it reduces the size of the data sets to be searched.

Generating FP-Trees Pseudocode




The FP-Growth algorithmic program works as follow:

  1. Scan the transaction database once, as among the Apriori algorithmic program, to seek out all the frequent items and their Support.
  2. Sort the frequent items in descending order of their Support.
  3. Initially, begin making the FP-tree with a root “null”.
  4. Get the primary transaction from the transaction database. Takeaway all non-frequent items and list the remaining items in line with the order among the sorted frequent items.
  5. Use the transaction to construct the primary branch of the tree with each node corresponding to a frequent item and showing that item’s frequency that’s one for the primary transaction.
  6. Get the next transaction from the transaction database. Takeaway all non-frequent items and list the remaining items in line with the order among the sorted frequent items.
  7. Insert the transaction within the tree using any common prefix that may appear. Increase the item counts.
  8. Continue with Step 6 until all transactions among the database are processed.

How to create FP-Tree using Example




let there is a set of items such that:
L={A, B, C, D, E}
First, the algorithm creates the root node of the tree, with the tag “null.” Then it scans the database for the second time. Each item in a transaction is ordered by the sequence of L. Later it creates a branch for each transaction. For example, the first transaction ​\( “001:A,B,C,D,E” \)​ contains five items  ​\( {C,D,E,A,B} \)​according to the sequence of L, generating the first branch ​\( {(C:1),(D:1),(E:1),(A:1),(B:1)} \)​, for building FP-tree. The branch has five nodes. In it, C is the children link of root, D links to C, E links to D, A links to E, and B links to A. The second transaction ​\( “002:B,C,E” \)​ contains three items {C, E, B} according to the sequence of , generating a branch. In it, C links to the root, E links to C, and B links to E. This branch shares the prefix ⟨C⟩ with the existing path of transaction “001.” In this way, the algorithm makes the count of node C increase by 1 and creates two new nodes ​\( {(E:1),(B:1)} \)​ as a link ​\( (C:2) \)​ of. Generally, the algorithm considers increasing a branch for a transaction and when each node follows common prefix, its count increases by 1; algorithm creates node for the item following the prefix and linking.

 TID Items
001 A, B, C, D, E
002 B, C, E
003 C, E, D
004 A, C, D

Table 1: Transaction Database

Generating FP-Tree

Figure 1: Generating FP-Tree

For convenience of tree traversal, the algorithm creates an item header table. Each item through a node link points to itself in FP-tree. After scanning all transactions, we get the FP-tree displayed in Figure 1.

References

[1] Barış Yıldız, Belgin Ergenç. “Comparison Of Two Association Rule Mining Algorithms Without Candidate Generation”, In IASTED International Conference on Artifical Intelligence and Applications (AIA 2010), Austria, Feb 15-17, 2010

[2] J. Han, J. Pei, and Y. Yin (2000) “Mining frequent patterns without candidate generation”, In Proceeding 2000 ACM-SIGMOD International Conference Management of Data, pp. 1–12

[3] Zeng, Yi, et al. “Research of improved FP-Growth algorithm in association rules mining.” Scientific Programming 2015 (2015): 6.

[4] M.S. Mythili and A.R. Mohamed Shanavas, “Performance Evaluation of Apriori and FP-Growth Algorithms”, International Journal of Computer Applications (IJCA) Volume 79 – No10, October 2013

One Comment

  • USA Proxy August 11, 2018 at 5:07 am

    very interesting information! .

Leave a Reply

Your email address will not be published. Required fields are marked *