Deriving Frequent Itemsets from Frequent Closed Itemsets using the LevelWise Algorithm (SPMF documentation)
This example explains how to run the LevelWise algorithm using the SPMF open-source data mining library.
How to run this example?
- If you are using the graphical interface, (1) choose the "LevelWise" algorithm, (2) select the input file "contextMushroom_FCI90.txt", (3) set the output file name (e.g. "output.txt") and (4) click "Run algorithm".
- If you want to execute this example from the command line,
then execute this command:
java -jar spmf.jar run LevelWise contextMushroom_FCI90.txt output.txt in a folder containing spmf.jar and the example input file contextMushroom_FCI90.txt. - If you are using the source code version of SPMF, launch the file "MainTestLevelWise_saveToFile.java" in the package ca.pfv.SPMF.tests.
What is LevelWise?
LevelWise is the first algorithm to derive frequent itemsets from frequent closed itemsets. This method was proposed by Nicolas Pasquier et al. (1999).
What is the input of the LevelWise algorithm?
The input of LevelWise is a FCI database.
A FCI database is a set of frequent closed itemsets. For example, consider the following FCI database. It contains 6 frequent closed itemsets and support number of each itemset. This database is provided as the file contextMushroom_FCI90.txt. This input file was obtained by applying the Charm algorithm (proposed by Zaki) on the Mushroom.txt dataset with 90% as the minsup threshold.
frequent closed itemsets |
support |
{36 90 97} |
7576 |
{90 97} |
7768 |
{36 90 94} |
8192 |
{36 90} |
8200 |
{90 94} |
8216 |
{90} |
8416 |
What is the output of the LevelWise algorithm?
LevelWise is an algorithm for deriving frequent itemsets from frequent closed itemsets. A frequent itemset is an itemset which appears in at least minsup transactions from the transaction database. And a frequent closed itemset is a frequent itemset that none of its immediate supersets have the same support number as itself.
For example, if LevelWise is run on the previous FCI database, LevelWise produces the following result:
frequent itemsets |
support |
{36 90 94} |
8192 |
{36 90 97} |
7576 |
{36 94} |
8192 |
{36 90} |
8200 |
{90 97} |
7768 |
{36 97} |
7576 |
{90 94} |
8216 |
{36} |
8200 |
{97} |
7768 |
{94} |
8216 |
{90} |
8416 |
How should I interpret the results?
In the result, each frequent itemset is annotated with its support number. The support number of an frequent itemset is how many times the itemset appears in the original transaction database from which we obtain the frequent closed itemsets. For example, the itemset {36 90} has a support of 8200 which is a frequent itemset because its support number is higher or equal to the minsup threshold.
Input file format
#SUP: 7576
90 97 #SUP: 7768
36 90 94 #SUP: 8192
36 90 #SUP: 8200
90 94 #SUP: 8216
90 #SUP: 8416
Note that it is also possible to use the ARFF format as an alternative to the default input format. The specification of the ARFF format can be found here. Most features of the ARFF format are supported except that (1) the character "=" is forbidden and (2) escape characters are not considered. Note that when the ARFF format is used, the performance of the data mining algorithms will be slightly less than if the native SPMF file format is used because a conversion of the input file will be automatically performed before launching the algorithm and the result will also have to be converted. This cost however should be small.
Output file format
The output file format is defined as follows. It is a text file, where each line represents a frequent itemset. On each line, the items of the itemset are first listed. Each item is represented by an integer and it is followed by a single space. After, all the items, the keyword "#SUP:" appears, which is followed by an integer indicating the support of the itemset, expressed as a number of transactions. For example, here is the input file for this example. The first line indicates the frequent itemset consisting of the item {36 90 94} and it indicates that this itemset has a support of 8192 transactions.
36 90 94 #SUP: 8192
36 90 97 #SUP: 7576
36 94 #SUP: 8192
36 90 #SUP: 8200
90 97 #SUP: 7768
36 97 #SUP: 7576
90 94 #SUP: 8216
36 #SUP: 8200
97 #SUP: 7768
94 #SUP: 8216
90 #SUP: 8416
Note that if the ARFF format is used as input instead of the default input format, the output format will be the same except that items will be represented by strings instead of integers.
Where can I get more information about the LevelWise algorithm?
This is the conference article describing LevelWise:
Nicolas Pasquier, Yves Bastide, Rafik Taouil and Lotfi Lakhal: Efficient mining of association rules using closed itemset lattices. Information Systems, Vol. 24, No. 1, pp. 25–46, 1999