Questions about Periodic Pattern Mining
(from the Pattern Mining Course)
Click on a question to see the answer.
A simple event sequence is a sequence that does not contain simultaneous events (events that have the same timestamp). A complex event sequence is a sequence that contains at least two events with the same timestamp. For instance <({a},1),({b},3),({c},5)> is a simple event sequence, while <({a,b,c},1),({b},3),({c},5)> is a complex event sequence.
A parallel episode is an episode that contains a single event set. For example {a,b} is a parallel episode.
A serial episode is a list of event sets where each event set contains a single event. For example <{a}{b},{d}> is a serial episode but <{a}{b},{c,d}> is not a serial episode.
The set of occurrences of <{a}> is occSet(<{a}>)={[1,1],[3,3],[6,6],[8,8]}. Thus, the head support of that episode is 4.
The set of occurrences of <{b,c}> is occSet(<{b,c}>)={[3,3],[7,7]}. Thus, the head support of that episode is sup(<{b,c}>) = |{3,7} | = 2.
The set of occurrences of <{a},{c}> is occSet(<{a},{c}>)={[1,3],[1,5],[1,7],[3,5],[3,7],[6,7]}. Thus, the head support of that episode is sup(<{a},{c}>) = |{1,3,6} | = 3.
A frequent maximal episode is a frequent episode that is not a subsequence of another larger frequent episode. It is interesting to discover frequent maximal episodes for some applications to reduce the number of frequent episodes that is presented to the user. Generally, the number of frequent maximal episodes will be much smaller than the whole set of frequent episodes.
An episode rule is a pattern that has the form X --> Y indicating that if some episode X appears, it is likely to be followed by another episode Y. Episode rule mining is interesting for applications that require to perform predictions. There are various definitions of episode rules and also several algorithms to find episode rules.
In episode mining, the goal is to find subsequence of events that appear frequently in a single long sequence of events with timestamps. In sequential pattern mining, the goal is to find subsequences of events that appear in many sequences, and sequences generally have no timestamps.