Questions about Correlated and Statistically-Significant Patterns
(from the Pattern Mining Course)

Click on a question to see the answer.

Question 1: Consider the following transaction database, which contains five transactions and five items denoted as a, b, c, d, and e:
Transaction id Items
t1 {a, c, d}
t2 {b, c, e}
t3 {a, b, c, e}
t4 {b, e}
t5 {a, b, c, e}

What is the bond of itemset {c}?

What is the bond of itemset {c,e}?

What is the bond of itemset {a,b,c}?

The bond of {c} is sup({c}) / dsup({c}) = 4 / 4 = 1

The bond of {c,e} is sup({c,e}) / dsup({c,e}) = 3 / 5

The bond of {a,b,c} is sup({a,b,c}) / dsup({a,b,c}) = 2 / 5 = 0.2

Question 2: Consider the transaction database from Question 1.

What is the all-confidence of itemset {c}?

What is the all-confidence of itemset {c,e}?

What is the all-confidence of itemset {a,b,c}?

The all-confidence of {c} is allconf({c}) = 4 / 4 = 1

The all-confidence of {c,e} is allconf({c,e}) = 3 / 4 = 0.75

The all-confidence of {a,b,c} is allconf({a,b,c}) = 2 / 4 = 0.5

Question 3: Let there be two itemsets called X and Y such that TIDLIST(X) = {t1, t2, t4} and TIDLIST(Y) = {t2, t3, t4, t5}.

What is the TIDLIST of itemset Z = X ∪Y ?

The TIDLIST of itemset Z = X ∪Y is TIDLIST(Z) = TIDLIST(X) ∩TIDLIST(Y) = {t1, t2, t4}∩{t2, t3, t4, t5} = {t2, t4}

Question 4: Let there be two itemsets called X and Y such that DTIDLIST(X) = {t1, t2, t4} and DTIDLIST(Y) = {t2, t3, t4, t5}.

What is the DTIDLIST of itemset Z = X ∪Y ?

The DTIDLIST of itemset Z = X ∪Y is DTIDLIST(Z) = DTIDLIST(X) ∪DTIDLIST(Y) = {t1, t2, t4}∪{t2, t3, t4, t5} = {t1,t2,t3, t4, t5}

Question 5: If we design an algorithm to mine frequent correlated itemsets using the all-confidence, what are the main properties that we can use to reduce the search space?

If we design an algorithm to mine rare correlated itemsets using the all-confidence, what are the main properties that we can use to reduce the search space?

If we design an algorithm to mine frequent correlated itemsets using the all-confidence, we can use two properties to reduce the search space:

  • The Apriori (anti-monotonicity) property of the support, i.e. the support of an itemset cannot be more than that of its subsets
  • The Apriori (anti-monotonicity) property of the all-confidence, i.e. the all-confidence of an itemset cannot be more than that of its subsets

If we design an algorithm to mine rare correlated itemsets using the all-confidence, we can use one property to reduce the search space:

  • The Apriori property of the all-confidence, i.e. the all-confidence of an itemset cannot be more than that of its subsets
Question 6: Some correlation measures are null-invariant. Why is this a desirable property for a correlation measure?

A measure is null-invariant if the measure's value for any itemset X is not influenced by transactions that does not contain X. In other words, if you add or remove transactions that do not contain X to a database, the value of the measure for X will not change. This is desirable because it ensures that the measure behave in a more stable way. For example if you have an itemset {apple, orange}, the bond of {apple, orange} will not be influenced by people who do not buy apple or orange.