Questions about Concise Representations of Itemsets

(from the Pattern Mining Course)

Click on a question to see the answer.

Question 1: Consider the following transaction database, which contains five transactions and five items denoted as a, b, c, d, e:

Transaction id Items
t1 {a, c, d}
t2 {b, c, e}
t3 {a, b, c, e}
t4 {b, e}
t5 {a, b, c, e}

If minsup = 2, write down all the frequent closed itemsets.

The frequent closed itemsets are:

frequent closed itemsets support
{c} 4
{a, c} 3
{b, e} 4
{b, c, e} 3
{a, b, c, e} 2
Question 2: For the transaction database of Question 1, if minsup = 2, what are the frequent maximal itemsets?

The frequent maximal itemsets are:

frequent maximal itemsets support
{a, b, c, e} 2
Question 3: For the transaction database of Question 1, if minsup = 2, what are the frequent generator itemsets?

The frequent generator itemsets are:

frequent generator itemsets support
{} the empty set 5
{a} 3
{a,b} 2
{a,e} 2
{b} 4
{b,c} 3
{c} 4
{c, e} 3
{e} 4

It can be observed that the empty set is also a frequent generator if we are very strict about the definition of generator.

Question 4: a) What is the advantage of discovering closed itemsets instead of maximal itemsets?

b) What is the advantage of discovering maximal itemsets instead of closed itemsets?

Both maximal and closed itemsets are concise representations of all itemsets.

The advantage of closed itemsets over maximal itemsets is that closed itemsets are lossless. This means that from the frequent closed itemsets, we can recover all the frequent itemsets and their support without having to read the database. Thus, we don't loose any information when we mine the closed itemsets. Differently, the maximal itemsets are not lossess. From the maximal itemsets, we can recover all the frequent itemsets but we don't know their support values. To find the support values, we need to read the database to calculate the support values. This is a disadvantage of maximal itemsets.

The main advantage of maximal itemsets over closed itemsets is that the set of maximal itemsets is a subset of that of closed itemsets. Thus, the number of maximal itemsets can be smaller than that of closed itemsets.

Question 5: Let's say that someone applies an algorithms to find frequent maximal itemsets in a database and he finds only one frequent maximal itemset : {a,b,c}. Write down all the frequent itemsets.

{a}, {b}, {c}, {a,b}, {b,c}, {a,c}, {a, b, c}

Question 6: Let's say that someone applies an algorithms to find frequent closed itemsets in a database and he finds only two frequent closed itemset : {a,b,c} with a support of 3 and {a,b} with a support of 4.
Write down all the frequent itemsets and their support.

{a}, support = 4

{b}, support = 4

{c}, support = 3

{a,b}, support = 4

{b,c}, support = 3

{a,c}, support = 3

{a, b, c} support = 3

Question 7: What is the key property that algorithms for discovering frequent generator itemsets use to reduce the search space?

An itemset can only be a generator if all its subsets are also generators.