FP Growth(FP-tree) Algorithm with Example

November 1, 2017

​​FP-growth algorithm : Introduction

The FP-growth algorithm is one of the fastest approaches to frequent itemset mining. The FP-Growth Algorithm adopts “Divide and Conquer” strategy to mine the patterns. The algorithm compresses the database, frequent items are transformed into a frequent-pattern tree. the algorithm preserves the association information available in Dataset. The database is compressed into a set of rules that represent entire databases. Each rule shows conditional decisions based on itemset.
First, The database is scanned to find a list of frequent items, in descending order. Then the FP-tree is constructed as.
Create the root node of the tree and scan the database, second time. The items in each transaction are evaluated, with respect to the frequent items list. Using, the mutual frequency of items, these calculated items help to create a branch node. When considering the branch to be added in FP-Tree, the count of each node with a common prefix is preferred.

.

the mining of frequent patterns and trees is also known as association rule mining. therefore how two items come together in a transaction, is the key concept of an association rule mining. this algorithm helps to measure the frequency of association of two items in a set of transactions.  the transactions $$T_i$$ are a set of items $$Item_i$$. such that

$$T_i=(item_1, item_2,…,Item_j)$$

Therefore, a transaction set is built with a number of items. Additionally, a dataset contains a group of transactions. To understand the transactions and itemset we can take an example: in a supermarket, customers buying products and at the billing counter pay bills. The items which are purchased by a customer is item set and the billing of items are the transaction.

Generating FP-Trees Pseudocode

The FP-Growth algorithmic program works as follow:

1. Scan the transactional database once, similar as Apriori algorithm, to seek out all the frequent items and Support count.
2. Sort the frequent items in descending order, according to Support count.
3. Initially, start making the FP-tree with a root node “null”.
4. Get the primary transaction from the transactional database. Take away all non-frequent items and list the remaining items. In order to sort frequent items.
5. Use the transaction to construct the primary branch of the tree with each node corresponding to a frequent item and showing that item’s frequency that’s one for the primary transaction.
6. Get the next transaction from the database. Take away all non-frequent items and list the remaining items in line with the order among the sorted frequent items.
7. Insert the transaction within the tree using any common prefix that may appear. Increase the item counts.
8. Continue with Step 6 until all transactions among the database are processed.

How to create FP-Tree using Example

let there is a set of items such that:
L={A, B, C, D, E}
First, the algorithm creates the root node of the tree, with the tag “null”. Then it scans the database for the second time. Each item in a transaction is ordered by the sequence of L. Later it creates a branch for each transaction. For example, the first transaction ​$$“001:A,B,C,D,E”$$​ contains five items  ​$${C,D,E,A,B}$$​according to the sequence of L, generating the first branch ​$${(C:1),(D:1),(E:1),(A:1),(B:1)}$$​, for building FP-tree. The branch has five nodes. In it, C is the children link of the root, D links to C, E links to D, A links to E, and B links to A.

The second transaction ​$$“002:B,C,E”$$​ contains three items {C, E, B} according to the sequence of, generating a branch. In it, C links to the root, E links to C, and B links to E. This branch shares the prefix ⟨C⟩ with the existing path of transaction “001.” In this way, the algorithm makes the count of node C increase by 1 and creates two new nodes ​$${(E:1),(B:1)}$$​ as a link ​$$(C:2)$$​ . Generally, the algorithm considers increasing a branch for a transaction and when each node follows common prefix, its count increases by 1; the algorithm creates a node for the item following the prefix and linking.

 TID Items 001 A, B, C, D, E 002 B, C, E 003 C, E, D 004 A, C, D

Table 1: Transaction Database

Figure 1: Generating FP-Tree

For the convenience of tree traversal, the algorithm creates an item header table. Each item through node-link points to itself in FP-tree. After scanning all transactions, we get the FP-tree displayed in Figure 1.

References

[1] Barış Yıldız, Belgin Ergenç. “Comparison Of Two Association Rule Mining Algorithms Without Candidate Generation”, In IASTED International Conference on Artifical Intelligence and Applications (AIA 2010), Austria, Feb 15-17, 2010

[2] J. Han, J. Pei, and Y. Yin (2000) “Mining frequent patterns without candidate generation”, In Proceeding 2000 ACM-SIGMOD International Conference Management of Data, pp. 1–12

[3] Zeng, Yi, et al. “Research of improved FP-Growth algorithm in association rules mining.” Scientific Programming 2015 (2015): 6.

[4] M.S. Mythili and A.R. Mohamed Shanavas, “Performance Evaluation of Apriori and FP-Growth Algorithms”, International Journal of Computer Applications (IJCA) Volume 79 – No10, October 2013

One Comment

• USA Proxy August 11, 2018 at 5:07 am

very interesting information! .