Questions about Sequential Pattern Mining
(from the Pattern Mining Course)
Click on a question to see the answer.
The sequence <{a},{b,c}> is a subsequence of sequence <{a},{b},{c}>?
The sequence <{a},{b,c}> is a subsequence of sequence <{a},{e},{a,b,c,d}>? Yes
The sequence <{a},{b,c}> is a subsequence of sequence <{a},{b},{c}>? No, because the items b and c are not together (in the same itemset)!
ID | Sequences |
S1 | {a}, {a b c}, {a c}, {d}, {c f} |
S2 | {a d}, {c}, {b c}, {a e} |
S3 | {e f}, {a b}, {d f}, {c}, {b} |
S4 | {e}, {g}, {a f}, {c}, {b}, {c} |
What is the support of the pattern <{a} >?
What is the support of the pattern <{a},{b}>?
What is the support of the pattern <{a,b}>?
What is the support of the pattern <{a} >? The support is 4
What is the support of the pattern <{a},{b}>? The support is 4
What is the support of the pattern <{a,b}>? The support is 2
The projected database of the item "f" is:
ID | Sequences |
S3 | {_ f}, {c}, {b} |
S4 | {_ f}, {c}, {b}, {c} |
An optimization to reduce memory in PrefixSpan is called "pseudo-projection".
It is based on the observation a copy of the database for each projection can spend a lot of time and use a lot of memory
The pseudo-projection optimization consists of not making copies of the database to do a projection. But instead we use pointers on the original database.