# Association Rule Mining

October 30, 2017

Data Mining is the discovery of hidden information found in databases and can be viewed as a step in the knowledge discovery process. Data mining functions include clustering, classification, prediction, and link analysis (associations). One of the most important data mining applications is that of mining association rules. An association rule has two parts, an antecedent (if) and a consequent (then). An antecedent is an item found in the data. A consequent is an item that is found in combination with the antecedent.

# Association Rule Mining: Overview

Association rules are created by analyzing data for frequent if/then patterns and using the criteria support and confidence to identify the most important relationships. Support is an indication of how frequently the items appear in the database. Confidence indicates the number of times the if/then statements have been found to be true.

Association rule mining has been an active research area in data mining, for which many algorithms have been developed. In data mining, association rule learning is a popular and well-accepted method for discovering interesting relations between variables in large databases. Association rules are employed today in many areas including web usage mining, intrusion detection and bioinformatics.

In general, the association rule is an expression of the form ​$$X →Y$$​ where X is antecedent and Y is consequent. An antecedent is an item found in the data. A consequent is an item that is found in combination with antecedent. The main aim is extracting important correlation among data items in the database. Basically it extracts the pattern from the data based on the two measures such as minimum confidence and minimum support. Support it indicates of how frequently the items appear in the database. Confidence indicates the number of times the if/then statement have been found to be true. Support it is the probability of item or item sets given transactional database.

$Support (x)= n(x)/n$

Where n is the total number of transaction in the database and n(x) is the number of transaction that contains the item set x.

$Support (X→Y)= Support (X⋃Y)$

Confidence It is conditional probability for an association rule ​$$X →Y$$​ as defined as

$Confidence (X⋃Y)= supprt(X⋃Y)/support(x)$

The various association rule mining algorithms were used to different applications to determine interesting frequent patterns. One of the association rule mining algorithm is apriori algorithm.

### Association rule mining: Definition

Association rule mining is a popular and well researched method for discovering interesting relations between variables in large databases. In [1] describes analyzing and presenting strong rules discovered in databases using different measures of interesting-ness. Based on the concept of strong rules, [2] introduced association rules for discovering regularities between products in large-scale transaction data recorded by point-of-sale (POS) systems in supermarkets. For example, the rule ​$${Onions,Potatoes}={Burger}$$​  found in the sales data of a supermarket would indicate that if a customer buys onions and potatoes together, he or she is likely to also buy hamburger meat. Such information can be used as the basis for decisions about marketing activities such as, e.g., promotional pricing or product placements. In addition to the above example from market basket analysis association rules are employed today in many application areas including Web usage mining, intrusion detection and bioinformatics. As opposed to sequence mining, association rule learning typically does not consider the order of items either within a transaction or across transactions. Here we present an example of database with 4 items and 5 transactions

 Transaction ID Milk Bread Butter Beer 1 1 1 0 0 2 0 0 1 0 3 0 0 0 1 4 1 1 1 0 5 0 1 0 0

Table 1 Database Transactions

The problem of association rule mining [12] is defined as: Let ​$$I={i_1,i_2,……. i_n}$$​ be a set of  binary attributes called items. Let ​$$D={t_1,t_2,…… t_m}$$​be a set of transactions called the database. Each transaction in  has a unique transaction ID and contains a subset of the items in. A rule is defined as an implication of the form ​$$X→Y$$​ where ​$$X,Y⊆I$$​and ​$$X⋂Y= φ$$​. The sets of items (for short item sets)  and  are called antecedent (left-hand-side or LHS) and consequent (right-hand-side or RHS) of the rule respectively.

To illustrate the concepts, we use a small example from the supermarket domain. The set of items is ​$${I}={milk,bread,butter,beer}$$​   and a small database containing the items (1 codes presence and 0 absence of an item in a transaction) is shown in the table to the right. An example rule for the supermarket could be ​$${butter,bread}→{milk}$$​meaning that if butter and bread are bought, customers also buy milk.

### References

[1] Agrawal R, Imielinski T, Swami A (1993) Mining association rules between sets of items in large databases. In: Proceedings of the 1993 ACM-SIGMOD international conference on management of data (SIGMOD’93), Washington, DC, pp 207–216

[2] “Association Rules Mining”, available online at: https://www.vskills.in/certification/tutorial/data-mining-and-warehousing/association-rules-mining/

[3] J. Usharani, Dr. K. Iyakutti, “Mining Association Rules for Web Crawling using Genetic Algorithm”, International Journal Of Engineering And Computer Science, Volume 2 Issue 8 August, 2013 Page No. 2635-2640

[4] Agrawal, Rakesh, and Ramakrishnan Srikant, “Fast algorithms for mining association rules”, Proc. 20th international conference very large data bases, VLDB, Volume 1215, 1994.

[5] https://www.quora.com/What-are-association-rules-in-data-mining

$${}$$