Questions about High Utility Itemset Mining

(from the Pattern Mining Course)

Click on a question to see the answer.

Question 1: High Utility Itemset Mining was proposed as a generalization of Frequent Itemset Mining to address some of its limitations. What are these limitations?

The main limitations of frequent itemset mining that are addressed by high utility itemset mining are:

  • All the items are viewed as being equally important. For example, selling one bread or selling one diamond is viewed as the same important in frequent itemset mining
  • There are no quantities in the transactions in frequent itemset mining. For example, if someone buy one bread or ten breads in a transaction, it is viewed as the same thing.

To address these limitations, high utility itemset mining introduces the concept of utility:

  • For each item, an external utility value (also called unit profit) indicate the relative importance of each item. For instance, selling one bread may yield a 1$ profit, while selling a diamond may yield a 500 $ profit.
  • In each transaction, an internal utility value (also called quantity) indicate how many units of each item appears. For instance, we may have the information that someone buys five breads.
Question 2: Is it possible to run a high utility itemset mining algorithm such as HUI-Miner and FHM to find frequent itemsets? If so, how?

Yes, it is possible. It is simple to do this. We just need to set all the purchase quantities of items to 0 or 1 in transactions, and to set the unit profi of all items to the same value (e.g. 1). Then, we can apply a high utility itemset mining algorithm with minutil = minsup to discover the frequent itemsets.

Question 3:

Let say that we have a customer transaction database, where there are five distinct items, named: a, b, c, d, and e.

The database has eight transactions, named T1, T2 ... T8, and these transactions have quantities:

Transaction

Items

T1

b(2),c(2),e(1)

T2

b(4),c(3),d(2),e(1)

T3

b(2),c(2),e(1)

T4

a(2),b(10),c(2),d(10),e(2)

T5

a(2),c(6),e(2)

T6

b(4),c(3),e(1)

T7

a(2),c(2),d(2)

T8

a(2),c(6),e(2)

We also have information about the unit profit of each item:

Item

Unit profit

a

5$

b

2$

c

1$

d

2$

e

3$

If we set minutil = 60 $, what are the high utility itemsets?

The high utility itemsets are:

{b,e} 
the utility is : 42 $ {a, c, e}
the utility is : 62 $ {b, d, e}
the utility is : 78 $ {b, c, d, e}
the utility is : 85$ {b, c, e}
the utility is : 74 $