Questions about Sequential Pattern Mining

(from the Pattern Mining Course)

Click on a question to see the answer.

Question 1: The sequence <{a},{b,c}> is a subsequence of sequence <{a},{e},{a,b,c,d}>?

The sequence <{a},{b,c}> is a subsequence of sequence <{a},{b},{c}>?

The sequence <{a},{b,c}> is a subsequence of sequence <{a},{e},{a,b,c,d}>? Yes

The sequence <{a},{b,c}> is a subsequence of sequence <{a},{b},{c}>? No, because the items b and c are not together (in the same itemset)!

Question 2: Consider the following sequence database:
ID Sequences
S1 {a}, {a b c}, {a c}, {d}, {c f}
S2 {a d}, {c}, {b c}, {a e}
S3 {e f}, {a b}, {d f}, {c}, {b}
S4 {e}, {g}, {a f}, {c}, {b}, {c}

What is the support of the pattern <{a} >?
What is the support of the pattern <{a},{b}>?
What is the support of the pattern <{a,b}>?

What is the support of the pattern <{a} >? The support is 4
What is the support of the pattern <{a},{b}>? The support is 4
What is the support of the pattern <{a,b}>? The support is 2

Question 3: The PrefixSpan algorithm utilizes the concept of projected database. Consider the database of the Question 2. What is the projected database of the item "f" ?

The projected database of the item "f" is:

ID Sequences
S3 {_ f}, {c}, {b}
S4 {_ f}, {c}, {b}, {c}
Question 4: PrefixSpan can create a lot of projected databases and this can consume a lot of memory. What is an optimization that is designed to reduce the memory usage of PrefixSpan?

An optimization to reduce memory in PrefixSpan is called "pseudo-projection".

It is based on the observation a copy of the database for each projection can spend a lot of time and use a lot of memory

The pseudo-projection optimization consists of not making copies of the database to do a projection. But instead we use pointers on the original database.