What is Sequential Pattern Mining

January 5, 2018 Author: virendra
Print Friendly, PDF & Email

The rapid growth of the amount of stored digital data and the recent developments in data mining techniques, have lead to an increased interest in methods for the exploration of data, creating a set of new data mining problems and solutions. Frequent Structure Mining is one of these problems. Its target is the discovery of hidden structured patterns in large databases. Sequences are the simplest form of structured patterns. In this article Sequential Pattern Mining is discussed.

Introduction of Sequential Pattern Mining





Sequential pattern is a set of itemsets structured in sequence database which occurs sequentially with a specific order. A sequence database is a set of ordered elements or events, stored with or without a concrete notion of time. Each itemset contains a set of items which include the same transaction-time value. While association rules indicate intra-transaction relationships, sequential patterns represent the correlation between transactions.

Sequential pattern mining discovers which items a single customer, having those items come from various transactions, brings in a particular order. The resulting pattern found after mining is the sequence of item sets that normally found frequent in specific order. Sequential pattern mining is used in various areas for different purposes. It can be used for identifying Customer Shopping Sequence to determine which item particular customer brings one after another in sequence.

Definition of Sequential Pattern Mining





Sequential Pattern Mining is defined as discovering the whole set of frequent subsequence in the set of sequential transactional database. The sequential pattern mining discovers the correlation between the different transactions. Sequential pattern mining (SPM) is the process that extracts certain sequential patterns whose support exceeds a predefined minimal support threshold. Additionally, sequential pattern mining helps to extract the sequences which reflect the most frequent behaviors in the sequence database, which in turn can be interpreted as domain knowledge for several purposes. To reduce the very large number of sequences into the most interesting sequential patterns and to meet the different user requirements, it is important to use a minimum support which prunes the sequential pattern with no interest. It is clear that a higher support of a sequential pattern is preferred for more interesting sequential patterns. Sequential pattern mining is used in several domains. SPM is used in business organizations to study customer behaviors. Additionally, SPM is used in computational biology to analyze the amino acid mutation patterns. SPM is also used in the area of web usage mining to mine several web logs distributed on multiple servers.

The task of sequential pattern mining is a data mining task specialized for analyzing sequential data, to discover sequential patterns. More precisely, it consists of discovering interesting subsequences in a set of sequences, where the interestingness of a subsequence can be measured in terms of various criteria such as its occurrence frequency, length, and profit. Sequential pattern mining has numerous real-life applications due to the fact that data is naturally encoded as sequences of symbols in many fields such as bioinformatics, e-learning, market basket analysis, texts, and webpage click-stream analysis.

Many studies and improvement methods have contributed to sequential pattern mining to make it more efficient. Sequential pattern mining algorithm can be broadly divided into two approaches: Apriori based (GSP, SPADE, SPAM), pattern growth (FreeSpan, PrefixSpan). Figure 1 shows the classification of the sequential mining pattern algorithm.

Classification of some Sequential Pattern-Mining Algorithm

Figure 1: Classification of some Sequential Pattern-Mining Algorithm

Applications of Sequential Pattern Mining





Sequential Pattern Mining finds interesting sequential patterns among the large database. It finds out frequent subsequences as patterns from a sequence database. With massive amounts of data continuously being collected and stored, many industries are becoming interested in mining sequential patterns from their database. Some examples of applications of sequential rule mining are:

  • E-learning,
  • Manufacturing simulation,
  • Quality Control,
  • Web Page Pre-fetching,
  • Anti-Pattern Detection In Service Based Systems,
  • Embedded Systems,
  • Alarm Sequence Analysis,
  • Restaurant Recommendation
  • DNA Sequences and Gene Structures
  • Telephone Calling Patterns, Weblog Click Streams

Research Challenges of Sequential Pattern Mining

Today several methods are available for efficiently discovering sequential patterns according to the initial definition. Such patterns are widely used for a large number of applications. But still there are various research challenges in this field of data mining. Some of the research challenges [1] are:

  • Finding the complete set of patterns and satisfying the minimum support (frequency) threshold is a complex task. When the database is large, distributed sequential pattern mining is used for mining process which helps to increase the scalability.
  • The ability to incorporate various kinds of user specific constraints is a complex process. To add other useful constraints to the RFM patterns, for example, the constraint that the number of repetitions in a sequence must be no less than a given threshold.
  • Constraints like frequency and Monetary constraints are difficult to be studied and checking their effect with respect to execution time, memory usage and scalability is also difficult.
  • Algorithm should handle large search space. Repeated scanning of the database during the mining process must be reduced as much as possible. To introduce the concept of object-orientedness in sequential pattern mining, by this there will be flexibility of mining only on the focused parts of the database.
  • The target oriented sequential pattern mining and its application in real dataset is difficult to proceed. Various methods are used by which early candidate sequences are pruned and search space partitioning will be possible for efficient mining of patterns.
  • There are many interesting problems especially in the development of specialized sequential pattern mining methods for particular applications such as DNA sequence mining that may identify faults which in turn allows various insertions, deletions, and mutations in DNA sequences, and handling industry or engineering sequential process analysis are interesting issues for future research.

References

[1] Philippe Fournier-Viger, “An Introduction to Sequential Rule Mining”, available online at: http://data-mining.philippe-fournier-viger.com/introduction-to-sequential-rule-mining/

[2] Qalalwi, Haytham, “Mining Sequential Pattern Algorithms Comparison”, In Conference Paper, January, 2014.

[3] Chetna Chand, Amit Thakkar, Amit Ganatra- Sequential Pattern Mining: Survey and Current Research Challenges, International Journal of Soft Computing and Engineering (IJSCE), Volume-2, Issue-1, March 2012.

[4] Slimani, Thabet, and Amor Lazzez, “Sequential mining: Patterns and algorithms analysis”, arXiv preprint arXiv: 1311.0350 (2013).

 

No Comments

Leave a Reply

Your email address will not be published. Required fields are marked *

Insert math as
Block
Inline
Additional settings
Formula color
Text color
#333333
Type math using LaTeX
Preview
\({}\)
Nothing to preview
Insert