Questions about Sequential Rule Mining

(from the Pattern Mining Course)

Click on a question to see the answer.

Question 1: Sequential pattern mining is a popular pattern mining task. However, some people say that it is not suitable for making prediction about the future. Why?

The reason is that sequential patterns are found only based on their support (occurrence frequency) but the support is not enough to conclude that some events are likely or unlikely to follow some other events. To make predictions on the next symbols or events of a sequence, we need to have a measure of the confidence (probability) that some item(s) will follow some other item(s).

Sequential rule mining provides a solution to this problem by finding patterns that have the form of a rule X => Y, and by measuring not only the support but also the confidence. The confidence is a measure of the conditional probability (P(Y|X) that Y will follow X. Having information about the confidence is appropriate for making predictions.

Question 2: Consider the following sequence database:
ID Sequences
S1 {a}, {a b c}, {a c}, {d}, {c f}
S2 {a d}, {c}, {b c}, {a e}
S3 {e f}, {a b}, {d f}, {c}, {b}
S4 {e}, {g}, {a f}, {c}, {b}, {c}

What is the support and confidence of the standard sequential rule: <{a}, {b,c}> → <{d}> ?

The support of <{a}, {b,c}> → <{d}> is 1 because it appears in only one sequence:

ID Sequences
S1 {a}, {a b c}, {a c}, {d}, {c f}
S2 {a d}, {c}, {b c}, {a e}
S3 {e f}, {a b}, {d f}, {c}, {b}
S4 {e}, {g}, {a f}, {c}, {b}, {c}

The support of <{a}, {b,c}> is 2 because it appears in 2 sequences:

ID Sequences
S1 {a}, {a b c}, {a c}, {d}, {c f}
S2 {a d}, {c}, {b c}, {a e}
S3 {e f}, {a b}, {d f}, {c}, {b}
S4 {e}, {g}, {a f}, {c}, {b}, {c}

Hence, the confidence of <{a}, {b,c}> is 1 / 2 = 0.50

Question 3: Consider the following sequence database:
ID Sequences
S1 {a}, {a b c}, {a c}, {d}, {c f}
S2 {a d}, {c}, {b c}, {a e}
S3 {e f}, {a b}, {d f}, {c}, {b}
S4 {e}, {g}, {a f}, {c}, {b}, {c}
What is the support and confidence of the partially-ordred sequential rule: {a,b} → {c} ?

The support of the partially-ordred sequential rule: {a,b} → {c} is 3 because items a and b appear before c in three sequences:

ID Sequences
S1 {a}, {a b c}, {a c}, {d}, {c f}
S2 {a d}, {c}, {b c}, {a e}
S3 {e f}, {a b}, {d f}, {c}, {b}
S4 {e}, {g}, {a f}, {c}, {b}, {c}

The support of the antecedent {a,b} is 4 because a and b appear in 4 sequences.

Hence, the confidence of {a,b} → {c} is 3 / 4 = 0.75

Question 4: Give some reason(s) and explain with an example why partially-ordered sequential rules may be more interesting than traditional sequential rules for some applications.

A limitation of traditional sequential rule mining is that the definition of sequential rule requires a very strict ordering between events. As a result, rules are too specialized and not applicable in many situations. For instance, a rule

<{a}, {b,c}> → <{d}> would not match with the sequence <{a}, {a b}, {a c}, {d}> because b appears before c rather than occuring at the same time, but in fact this is just a small ordering variation, and practice it might represent the same situation..

Partially-ordered sequential rules address this issue by being less strict about the ordering between events. This allows to find rules that are more general and as a result each rule can be applied in many situations.

For instance, the rule {a,b,c} → {d} is said to appear in the sequence <{a}, {a b}, {a c}, {d}> and also in the sequence <{a}, {b,c}, {d}>

Another reason why partially-ordered sequential rules are interesting is that they can summarize multiple traditional sequential rules.
For instance, the partially-ordered sequential rule {a,b} → {c} can replace multiple traditional sequential rules such as:

<{a}, {b}> → <{c}>
<{a,b}> → <{c}>
<{b,a}> → <{c}>