Questions about Frequent Itemsets and Association Rules
(from the Pattern Mining Course)

Click on a question to see the answer.

Question 1: Consider the following transaction database, which contains five transactions and five items denoted as a, b, c, d, and e:
Transaction id Items
t1 {a, c, d}
t2 {b, c, e}
t3 {a, b, c, e}
t4 {b, e}
t5 {a, b, c, e}

If minsup = 2, write down all the frequent itemsets and their support.

The frequent itemsets are :

itemsets support
{a} 3
{b} 4
{c} 4
{e} 4
{a, b} 2
{a, c} 3
{a, e} 2
{b, c} 3
{b, e} 4
{c, e} 3
{a, b, c} 2
{a, b, e} 2
{a, c, e} 2
{b, c, e} 3
{a, b, c, e} 2
Question 2: Consider the following transaction database, which contains six transactions and five items denoted as a, b, c, d, and e:
Transaction id Items
t1 {a, b, d, e}
t2 {b, c, e}
t3 {a, b, d, e}
t4 {a, b, c, e}
t5 {a, b, c, d, e}
t6 {b, c, d}

What is the confidence and support of the rule {a} ==>{b, d, e}?

 What is the confidence and support of the rule {b, d, e} ==>{a}?

The support and confidence of the rule {a} ==>{b, d, e} are 3 transactions and 0.75 (75%), respectively.

The support and confidence of the rule {b, d, e} ==>{a} are 3 transaction and 1 (100%), respectively

Question 3: What is the "Apriori property" ? And why is it important?

Let there be two itemsets X and Y. If X ⊂ Y, the support of Y is less than or equal to the support of X.

This property is important in frequent itemset mining because it allows to reduce the search space, that is to find all the frequent itemsets without having to explore all the possibilities. If we know that an itemset is infrequent, it is not necessary to explore its supersets.

Question 4: Let's say that the Apriori algorithm finds that the itemsets {a, b}, {a, c}, {c,d} and {c,e} are frequent in a database. Then, what are the candidate itemsets that Apriori will create from those?

The candidates will be {a,b,c} and {c,d,e}.

Question 5: The Eclat algorithm transforms the input database into a vertical database. What is the vertical database corresponding to the database of Question 1?

The vertical database is:

Item Transaction list
a t1, t3, t5
b t2, t3, t5
c t1, t2, t3, t5
d t2, t5
e t1, t2, t3, t5
Question 6: Let's say that a transaction database contains 5 distinct items. Two of those items are infrequent and three items are frequent.

What is the minimum number of frequent itemsets in that database?

What is the maximum number of frequent itemsets in that database?

Generally, if there are n items in a database, there are at most 2^n - 1 frequent itemsets (excluding the empty itemset).

Since infrequent items cannot be part of a frequent itemset, we can ignore them for this question.

Thus, if n = 3, we have a maximum of 2^3 - 1 = 7 frequent itemsets.

Now what about the minimum? According to the question, we know that three items are frequent, so the minimum number of frequent itemsets in that database is 3.