This example explains how to run the VME algorithm using the SPMF open-source data mining library.
How to run this example?
What is the VME algorithm?
VME (Deng & Xu, 2010) is an algorithm for mining erasable itemsets from a product database with profit information.
What is the input?
VME takes as input a product database and a threshold (a value between 0 and 100%). A product is defined as a set of items that are used to assemble the product. Moreover each product is annotated with a profit (a positive integer) that indicates how much money this product generate for the company. For example, let's consider the following product database, consisting of 6 products and 7 items (this example is taken from the article of Deng & Xu, 2010). Each product is annotated with the profit information. For example, the first line indicates that the product 1 generate a total profit of 50 $ for the company and that its assembly requires parts 2, 3, 4 and 6. This product database is provided in the file "contextVME.txt" of the SPMF distribution.:
profit | items | |
product1 | 50$ | {2, 3, 4, 6} |
product2 | 20$ | {2, 5, 7} |
product3 | 50$ | {1, 2, 3, 5} |
product4 | 800$ | {1, 2, 4} |
product5 | 30$ | {6, 7} |
product6 | 50$ | {3, 4} |
What is the output?
The output is the set of erasable itemsets generating a loss of profit lower or equal to the user-specificed threshold. The idea is to discover item that the company could stop manufacturing and that would minimize the amount of profit lost by being unable to build products.
To explain what is an erasable itemset more formally, it is necessary to review some definitions An itemset is an unordered set of distinct items. The loss of profit generated by an itemset is defined as the sum of the product profit for all products containing an item from this itemset. For example, the lost of profit of itemset {5, 6} is the sum of the profits of products containing 5 and/or 6: 50$ + 20 $ + 50 $ + 30 $ = 150 $. The loss of profit can also be expressed as a percentage of the total profit of the database. For example, in this database the total profit is 50 + 20 + 50 + 800 + 30 + 50 = 1000$. Therefore, the lost of profit by the itemset {5, 6} could be expressed as 15% (150 / 1000 * 100).
By running VME with a threshold of 15 %, we obtain 8 erasable itemsets (having a profit loss less or equal to 15% x 1000$ = 150 $):
erasable itemsets | loss of profit ("gain") |
{3} | 150 |
{5} | 70 |
{6} | 80 |
{7} | 50 |
{5 6} | 150 |
{5 7} | 100 |
{6 7} | 100 |
{5 6 7} | 150 |
This means that if the items from one of those erasable itemsets are not manufactured anymore, then the loss of profit will be lower or equal to 15%.
Input file format
The input file format of VME is defined as follows. It is a text file. Each lines represents a transaction. Each line is composed of two sections, as follows.
- First, the profit of the transaction is indicated by an integer number, followed by a single space.
- Second, the items in the transaction are listed. An item is represented by a positive integer. Each item is separated from the following item by a space. It is assumed that items are sorted according to a total order and that no item can appear twice in the same transaction.
For example, for the previous example, the input file is defined as follows:
50 2 3 4 6
20 2 5 7
50 1 2 3 5
800 1 2 4
30 6 7
50 3 4Consider the first line. It means that the transaction {2, 3, 4, 6} has a profit of 50 and it contains the items 2, 3, 4 and 6. The following lines follow the same format.
Output file format
The output file format of VME is defined as follows. It is a text file, where each line represents an erasable itemset. On each line, the items of the itemset are first listed. Each item is represented by an integer, followed by a single space. After, all the items, the keyword "#LOSS:" appears, which is followed by a integer value indicating the loss of profit for that itemset.
3 #LOSS: 150
5 #LOSS: 70
6 #LOSS: 80
7 #LOSS: 50
5 6 #LOSS: 150
5 7 #LOSS: 100
6 7 #LOSS: 100
5 6 7 #LOSS: 150For example, the first line indicates that the itemset {3} would generate a loss of profit of 150. The following lines follows the same format.
Performance
The VME algorithm is Apriori-based. It is not the fastest algorithm for this problem. But it is the only one available in SPMF because this problem is not very popular. For more efficient algorithms for this problem, you can search for the author names. They have proposed a few algorithms with some improvements.
Where can I get more information about the VME algorithm?
Here is an article describing the VME algorithm:
Z. Deng, X. Xu: An Efficient Algorithm for Mining Erasable Itemsets. ADMA (1) 2010: 214-225.
For a good overview of frequent itemset mining algorithms, you may read this survey paper.
<< Return to table of contents of SPMF documentation
Copyright © 2008-2020 Philippe Fournier-Viger. All rights reserved.