Questions about Sequential Rule Mining
(from the Pattern Mining Course)
Click on a question to see the answer.
The reason is that sequential patterns are found only based on their support (occurrence frequency) but the support is not enough to conclude that some events are likely or unlikely to follow some other events. To make predictions on the next symbols or events of a sequence, we need to have a measure of the confidence (probability) that some item(s) will follow some other item(s).
Sequential rule mining provides a solution to this problem by finding patterns that have the form of a rule X => Y, and by measuring not only the support but also the confidence. The confidence is a measure of the conditional probability (P(Y|X) that Y will follow X. Having information about the confidence is appropriate for making predictions.
ID | Sequences |
S1 | {a}, {a b c}, {a c}, {d}, {c f} |
S2 | {a d}, {c}, {b c}, {a e} |
S3 | {e f}, {a b}, {d f}, {c}, {b} |
S4 | {e}, {g}, {a f}, {c}, {b}, {c} |
What is the support and confidence of the standard sequential rule: <{a}, {b,c}> → <{d}> ?
The support of <{a}, {b,c}> → <{d}> is 1 because it appears in only one sequence:
ID | Sequences |
S1 | {a}, {a b c}, {a c}, {d}, {c f} |
S2 | {a d}, {c}, {b c}, {a e} |
S3 | {e f}, {a b}, {d f}, {c}, {b} |
S4 | {e}, {g}, {a f}, {c}, {b}, {c} |
The support of <{a}, {b,c}> is 2 because it appears in 2 sequences:
ID | Sequences |
S1 | {a}, {a b c}, {a c}, {d}, {c f} |
S2 | {a d}, {c}, {b c}, {a e} |
S3 | {e f}, {a b}, {d f}, {c}, {b} |
S4 | {e}, {g}, {a f}, {c}, {b}, {c} |
Hence, the confidence of <{a}, {b,c}> is 1 / 2 = 0.50
ID | Sequences |
S1 | {a}, {a b c}, {a c}, {d}, {c f} |
S2 | {a d}, {c}, {b c}, {a e} |
S3 | {e f}, {a b}, {d f}, {c}, {b} |
S4 | {e}, {g}, {a f}, {c}, {b}, {c} |
The support of the partially-ordred sequential rule: {a,b} → {c} is 3 because items a and b appear before c in three sequences:
ID | Sequences |
S1 | {a}, {a b c}, {a c}, {d}, {c f} |
S2 | {a d}, {c}, {b c}, {a e} |
S3 | {e f}, {a b}, {d f}, {c}, {b} |
S4 | {e}, {g}, {a f}, {c}, {b}, {c} |
The support of the antecedent {a,b} is 4 because a and b appear in 4 sequences.
Hence, the confidence of {a,b} → {c} is 3 / 4 = 0.75
A limitation of traditional sequential rule mining is that the definition of sequential rule requires a very strict ordering between events. As a result, rules are too specialized and not applicable in many situations. For instance, a rule
<{a}, {b,c}> → <{d}> would not match with the sequence <{a}, {a b}, {a c}, {d}> because b appears before c rather than occuring at the same time, but in fact this is just a small ordering variation, and practice it might represent the same situation..
Partially-ordered sequential rules address this issue by being less strict about the ordering between events. This allows to find rules that are more general and as a result each rule can be applied in many situations.
For instance, the rule {a,b,c} → {d} is said to appear in the sequence <{a}, {a b}, {a c}, {d}> and also in the sequence <{a}, {b,c}, {d}>
Another reason why partially-ordered sequential rules are interesting is that they can summarize multiple traditional sequential rules.
For instance, the partially-ordered sequential rule {a,b} → {c} can replace multiple traditional sequential rules such as:
<{a}, {b}> → <{c}>
<{a,b}> → <{c}>
<{b,a}> → <{c}>