Discovery of High Utility Itemsets Using a Artificial Bee Colony Algorithm with the HUIM-ABC algorithm (SPMF documentation)

This example explains how to run the HUIM-ABC algorithm using the SPMF open-source data mining library.

How to run this example?

What is HUIM-ABC?

HUIM-ABC is an algorithm for discovering high utility itemsets (HUIs) which have utility value no less than the minimum utility threshold in a transaction database. The HUIM-ABC algorithm discovers HUIs using a artificial bee colony optimization algorithm (ABC). It was proposed by Wei Song et al. at PAKDD 2018.

 What is the input?

HUIM-ABC takes as input a transaction database with utility information. Let's consider the following database consisting of 7 transactions (t1,t2, ..., t7) and 5 items (1, 2, 3, 4, 5). This database is provided in the text file "contextHUIM.txt" in the package ca.pfv.spmf.tests of the SPMF distribution.


Transaction utility

Item utilities for this transaction


2 3 4


2 2 5


1 2 3 4 5


4 2 3 5 4


1 3 4


4 2 5


3 4 5


2 5 4


1 2 4 5


5 4 5 8


1 2 3 4


3 8 1 5


4 5


5 4

Each line of the database is:

Note that the value in the second column for each line is the sum of the values in the third column.
What are real-life examples of such a database? There are several applications in real life. One application is a customer transaction database. Imagine that each transaction represents the items purchased by a customer. The first customer named "t1" bought items 2, 3 and 4. The amount of money spent for each item is respectively 2 $, 2 $ and 5 $. The total amount of money spent in this transaction is 2 + 2 + 5 = 9 $.

What is the output?

The output of HUIM-ABC is the set of high utility itemsets. An itemset X in a database D is a high-utility itemset (HUI) if and only if its utility is no less than the minimum utility threshold. For example, if we run HUIM-ABC and set the minimum utility threshold to 40, we may obtain 2 high utility itemsets.







It is to be noted that the HUIM-ABC algorithm also has an optional BucketNum parameter, which should be set to 2 in this example to obtain results quickly. The BucketNum is optional and influence the search for high utility itemsets. The BucketNum parameter should be set to a small value such as 2 for dataset containing few items as in this example, and values such as 10 for datasets containing numerous items. It can have a somewhat large influence on performance and thus it can be important to set it to a proper value. The default value is 10.

Input file format

The input file format of high utility itemsets is defined as follows. It is a text file. Each lines represents a transaction. Each line is composed of three sections, as follows.

For example, for the previous example, the input file is defined as follows:
2 3 4:9:2 2 5
1 2 3 4 5:18:4 2 3 5 4
1 3 4:11:4 2 5
3 4 5:11:2 5 4
1 2 4 5:22:5 4 5 8
1 2 3 4:17:3 8 1 5
4 5:9:5 4

Consider the first line. It means that the transaction {2, 3, 4} has a total utility of 9 and that items 2, 3 and 4 respectively have a utility of 2, 2 and 5 in this transaction. The following lines follow the same format.

Output file format

The output file format of high utility itemsets is defined as follows. It is a text file, each following line represents a high utility itemset. On each line, the items of the itemset are first listed. Each item is represented by an integer, followed by a single space. After, all the items, the keyword " #UTILITY: " appears and is followed by the utility of the itemset. For example, we show below an output file that may be obtained for this example.
4 5 #UTIL: 40
1 2 4 #UTIL: 41

For example, the first line indicates that the itemset {4, 5} is a high utility itemset which has utility equals to 41. The following lines follows the same format.

Implementation details

This is the original implementaiton of HUIM-ABC. Note may not exactly the same as the input format described in the original article. But it is equivalent.

Where can I get more information about the HUIM-ABC algorithm?

This is the reference of the article describing the HUIM-ABC algorithm:

Song, W., & Huang, C. (2018) Discovering high utility itemsets based on the artificial bee colony algorithm. In Pacific-Asia Conference on Knowledge Discovery and Data Mining (pp. 3-14). Springer, Cham.

Besides, for a general overview of high utility itemset mining, you may read this survey paper.