With the quick growth in e-commerce applications, there is an accumulation vast quantity of data in months not in years. Data Mining, also known as Knowledge Discovery in Databases (KDD), to find anomalies, correlations, patterns, and trends to predict outcomes. Apriori algorithm is a classical algorithm in data mining. It is used for mining frequent itemsets and relevant association rules. It is devised to operate on a database containing a lot of transactions, for instance, items brought by customers in a store. It is very important for effective Market Basket Analysis and it helps the customers in purchasing their items with more ease which increases the sales of the markets. It has also been used in the field of healthcare for the detection of adverse drug reactions. It produces association rules that indicate what all combinations of medications and patient.

Figure 1 Apriori algorithm example application

### Ariori Algorithm : Overview

One of the first algorithms to evolve for frequent itemset and Association rule mining was Apriori. Two major steps of the Apriori algorithm are the join and prune steps. The join step is used to construct new candidate sets. A candidate itemset is basically an item set that could be either Frequent or infrequent with respect to the support threshold. Higher level candidate itemsets \( (C_i) \) are generated by joining previous level frequent itemsets are \( L_i-1 \) with it. The prune step helps in filtering out candidate item-sets whose subsets (prior level) are not frequent. This is based on the anti-monotonic property as a result of which every subset of a frequent item set is also frequent. Thus a candidate item set which is composed of one or more infrequent item sets of a prior level is filtered(pruned) from the process of frequent itemset and association mining.

### Example and Description of Ariori Algorithm

So far, we learned what the Apriori algorithm is and why is important to learn it.

A key concept in Apriori algorithm is the anti-monotonicity of the support measure. It assumes that

- All subsets of a frequent itemset must be frequent
- Similarly, for any infrequent itemset, all its supersets must be infrequent too

Let us now look at the intuitive explanation of the algorithm with the help of the example we used above. Before beginning the process, let us set the support threshold to 50%, i.e. only those items are significant for which support is more than 50%.

**Step 1**: Create a frequency table of all the items that occur in all the transactions. For our case:

Item | Frequency (Number of Transaction) |

Onion (O) | 4 |

Potato (P) | 5 |

Burger (B) | 4 |

Milk (M) | 4 |

Beer (B) | 2 |

**Step 2**: We know that only those elements are significant for which the support is greater than or equal to the threshold support. Here, support threshold is 50%, hence only those items are significant which occur in more than three transactions and such items are Onion (O), Potato (P), Burger (B), and Milk (M). Therefore, we are left with:

Item | Frequency (Number of Transaction) |

Onion (O) | 4 |

Potato (P) | 5 |

Burger (B) | 4 |

Milk (M) | 4 |

The table above represents the single items that are purchased by the customers frequently.

**Step 3**: The next step is to make all the possible pairs of the significant items keeping in mind that the order doesn’t matter, i.e., AB is same as BA. To do this, take the first item and pair it with all the others such as OP, OB, and OM. Similarly, consider the second item and pair it with preceding items, i.e., PB, PM. We are only considering the preceding items because PO (same as OP) already exists. So, all the pairs in our example are OP, OB, OM, PB, PM, BM.

**Step 4**: We will now count the occurrences of each pair in all the transactions.

Item | Frequency (Number of Transaction) |

OP | 4 |

OB | 3 |

OM | 2 |

PB | 4 |

PM | 3 |

BM | 2 |

**Step 5**: Again only those itemsets are significant which cross the support threshold, and those are OP, OB, PB, and PM.

**Step 6**: Now let’s say we would like to look for a set of three items that are purchased together. We will use the itemsets found in step 5 and create a set of 3 items.

To create a set of 3 items another rule, called self-join is required. It says that from the item pairs OP, OB, PB and PM we look for two pairs with the identical first letter and so we get

**O**P and**O**B, this gives OPB**P**B and**P**M, this gives PBM

Next, we find the frequency for these two itemsets.

Item | Frequency (Number of Transaction) |

OPB | 4 |

PBM | 3 |

Applying the threshold rule again, we find that OPB is the only significant itemset.

Therefore, the set of 3 items that was purchased most frequently is OPB.

The example that we considered was a fairly simple one and mining the frequent itemsets stopped at 3 items but in practice, there are dozens of items and this process could continue to many items. Suppose we got the significant sets with 3 items as OPQ, OPR, OQR, OQS and PQR and now we want to generate the set of 4 items. For this, we will look at the sets which have first two alphabets common, i.e.

**OP**Q and**OP**R give OPQR**OQ**R and**OQ**S gives OQRS

### References

[1] “Laboratory Module 8: Mining Frequent Itemsets – Apriori Algorithm”, available online at: http://software.ucv.ro/~cmihaescu/ro/teaching/AIR/docs/Lab8-Apriori.pdf

[2] Jiao Yabing, “Research of an Improved Apriori Algorithm in Data Mining Association Rules”, International Journal of Computer and Communication Engineering, Vol. 2, No. 1, January 2013

[3] Markus Hegland, “The Apriori Algorithm – a Tutorial”, March 30, 2005 9:7 WSPC/Lecture Notes Series

[4] “A beginner’s tutorial on the apriori algorithm in data mining with R implementation”, available online at: http://blog.hackerearth.com/beginners-tutorial-apriori-algorithm-data-mining-r-implementation

## 16 Comments

Hey! I just wanted to ask if you ever have any issues with hackers?

My last blog (wordpress) was hacked and I ended up losing many months of hard work due to no back up.

Do you have any methods to prevent hackers?

do not allow any one to write on your blog or comment for each need a approval from admin

I’m really inspired with your writing abilities and also with

the format on your weblog. Is this a paid subject matter or did you customize it yourself?

Anyway keep up the nice high quality writing,

it’s uncommon to see a nice blog like this one nowadays..

thanks for appreciation sir.

What i don’t realize is if truth be told how you’re now not actually

much more well-favored than you might be now.

You are very intelligent. You recognize thus significantly with regards to

this topic, made me in my opinion believe it from numerous varied angles.

Its like women and men aren’t interested until it is one thing

to accomplish with Woman gaga! Your own stuffs great. At all times

take care of it up!

Hi there friends, its wonderful piece of writing about educationand completely defined, keep it up all the

time.

I used to be able to find good info from your articles.

This website was… how do you say it? Relevant!! Finally I have found something which helped me.

Thanks a lot!

If some one needs expert view concerning blogging and site-building then i advise him/her to pay a quick visit this webpage, Keep up the

pleasant job.

Quality articles or reviews is the crucial to invite the visitors to visit the website, that’s what this site

is providing.

nice post……..tea tv hd

nice post…………more

nice post………..fildo

Nice post…Thanks for sharing.. Homedecor guide

Good day! This is my first comment here so I just wanted to give

a quick shout out and tell you I really enjoy

reading through your posts. Can you recommend any other blogs/websites/forums

that deal with the same subjects? Thanks a ton!

FP Growth(FP-tree) Algorithm with Example