Run experiments to compare the performance of one or more algorithms on a dataset (one parameter is varied) (SPMF documentation)

This example explains how to to use the ExperimenterForParameterChange tool of SPMF, which allows to compare the performance of one or more algorithms on a dataset, when a parameter is varied.

This tool runs algorithms automatically and save the results to files. The results include a comparison of runtime, memory usage and output file size. These results are in a tab-separated format, which can be easily imported into Excel or other spreadsheet software to draw charts.

There is also an option to output results as PGFPlots charts that can be inserted directly in Latex documents such as research papers. This tool is was designed to speed-up the redundant task of running experiments to compare algorithms when writing research papers.

I will first explain how to use this tool from the user interface of SPMF, and then how the main functionnalities can be used from the source code version of SPMF.

How to start this tool?

To run this example, if you are using the graphical user interface of SPMF, (1) select the algorithm "Performance_experiment_one_parameter_varied" and then (2) click "Run algorithm". This will open the tool, which provides a simple user interface..

Alternatively, if you want to run this tool from the source code of SPMF (e.g. in Eclipse), without using the graphical user interface of SPMF, you can execute the same tool by running the file "MainTestExperimenterForParameterChange", located in the package "ca.pfv.spmf.test". However, this is less user-friendly as there is no user interface. Thus, it is recommended to use the graphical user interface of SPMF.

How to use the tool?

After you run the algorithm "Performance_experiment_one_parameter_varied" in the user interface of SPMF, the following window will open, which let you indicate what kind of experiments should be run.

experiment with spmf

I will explain step-by-step how to use this tool:

If you have followed the above instructions, the window will look like this:

all parameters are filled for the experiments

Then, we can click on the "Run the experiment" button K) to start the experiments!

During the experiments, the status of experiments will be displayed in the text area L) as follows:

console output from the tool

And finally, when all experiments are done, the results of the experiments will be displayed in the text area L) as follows:

********************************************
***** RESULTS *****
********************************************
INPUT FILE: D:\SPMF_ALL\SPMF 6\NEW TEST FILES 2020\NEW TEST FILES\chess.txt (the input file)
PARAMETERS: [##] (the parameter(s) used in this experiment)

TIME (S)
parameter 0,95 0,94 0.93. 0,92 0,91 0,9 0,89 0,88 0,87 0,86 0,85 0,84 0,83 0,82 0,81 0,8
Apriori 0,39 0,4 0,23 0,48 0,52 0,59 0,67 0,75 0,93 1,25 1,52 1,94 2,33 2,48 2,87 3,43
Eclat 0,53 0,61 0,32 0,58 0,63 0,71 0,75 0,81 0,88 0,86 0,92 1,07 1,11 1,27 1,41 1,47
FPGrowth_itemsets 0,77 0,59 0,39 0,55 0,57 0,53 0,6 0,65 0,63 0,54 0,55 0,62 0,6 0,6 0,62 0,69
FPClose 0,61 0,57 0,41 0,67 0,63 0,66 0,65 0,54 0,62 0,59 0,72 0,64 0,63 0,72 0,69 0,73
FPMax 0,67 0,51 0,33 0,5 0,59 0,56 0,58 0,59 0,56 0,58 0,59 0,59 0,55 0,72 0,68 0,58

(The above lines provide the results for the runtime comparison. The first line is the values of the varied parameter. The second line is the runtimes for Apriori for these parameter values. The following lines show the runtimes for the other algorithms, namely Eclat, FPGrowth, FPClose and FPMax, for these parameter values. For example, the second line means that Apriori took 0.39 seconds when the parameter was set to 0.95, and Apriori took 0.4 seconds when the parameter was set to 0.94, etc.)

MEMORY (MB)
parameter 0,95 0,94 0.93. 0,92 0,91 0,9 0,89 0,88 0,87 0,86 0,85 0,84 0,83 0,82 0,81 0,8
Apriori 18,88 18,88 18,88 18,88 18,88 19,32 19,76 19,76 19,76 20 20,44 21,32 21,32 22 3,34 3,36
Eclat 19,92 27,93 27,93 26,76 11,96 38,9 72,43 110,26 154,03 94,25 141,23 91,27 176,8 71,23 15,29 172,98
FPGrowth_itemsets 6,75 7,43 7,43 7,43 7,44 7,44 8,11 8,1 8,78 9,44 10,1 10,77 11,44 12,78 14,12 15,45
FPClose 6,76 7,43 7,43 7,43 7,43 7,43 8,1 8,77 8,76 9,43 10,1 11,43 12,11 12,1 13,45 15,45
FPMax 6,76 7,44 7,44 7,42 7,43 7,43 7,44 7,43 8,1 8,1 8,77 9,44 9,43 9,44 10,11 10,78

(The above lines provide the results for the memory comparison. The first line is the values of the varied parameter. The second line is the maximum memory usage for Apriori for these parameter values. The following lines show the maximum memory usage for the other algorithms, namely Eclat, FPGrowth, FPClose and FPMax, for these parameter values. For example, the second line means that Apriori used 18.88 megabytes of memory when the parameter was set to 0.95, and Apriori took also 18.88 megabytes of memory when the parameter was set to 0.94, etc.)

OUTPUT_SIZE (LINES)
parameter 0,95 0,94 0.93. 0,92 0,91 0,9 0,89 0,88 0,87 0,86 0,85 0,84 0,83 0,82 0,81 0,8
Apriori 77 139 -1 305 421 622 883 1195 1553 1987 2669 3484 4243 5312 6656 8227
Eclat 77 139 -1 305 421 622 883 1195 1553 1987 2669 3484 4243 5312 6656 8227
FPGrowth_itemsets 77 139 -1 305 421 622 883 1195 1553 1987 2669 3484 4243 5312 6656 8227
FPClose 73 124 -1 269 362 498 689 922 1183 1457 1885 2400 2883 3487 4216 5083
FPMax 11 18 -1 26 30 34 51 75 69 89 119 145 133 163 221 226

(The above lines compares the output size of the algorithms when the parameter is varied. The first line is the values of the varied parameter. The second line is the output size (here, number of patterns) for Apriori for these parameter values. The following lines show the output sizes for the other algorithms, namely Eclat, FPGrowth, FPClose and FPMax, for these parameter values. For example, the second line means that Apriori found 77 patterns when the parameter was set to 0.95, and Apriori found 139 patterns when the parameter was set to 0.94, etc.)

All the above results are also saved in a file called EXPERIMENT_RESULT.txt in the output directory.

Importing the results into Excel

The results from the file EXPERIMENT_RESULT.txt are tab-separated. This means that they can be easily imported into Excel to create charts. For this, just copy the above results into Excel. For example, here we copy the runtime results:

excel results

Then, we can select these results to generate charts:

chart excel

As can be seen in this picture, this chart is not perfect. In particular, the X axis is incorrect. This can be fixed by right-clicking on the chart in Excel and choosing "Select data" to set up the X axis properly. Besides, it is possible to adjust other aspects of the chart to make it look more pretty.

Importing the results into a Latex document

If you have selected the option to generate a latex file with PGFplot, a file called PGPLOT_FIGURES.tex will have been created in the output directory. This file can be compiled using Latex to generate charts, which can also be included in latex documents.

In the above example, the following charts have been created automatically:

performance results pgfplots

The Result Files

After running experiments, a set of files are created in the output directory. Here is an overview of those files.

Here is an overview of all the files created in the output directory for this example:

files from experiment

How to run experiments from the source code of SPMF?

Above, I have explained how to use the graphical interface of SPMF to run automated performance experiments. I will now explain how the same functionnality can be used from the source code of SPMF. This is suited for those who are familiar with Java and wish for example to run experiments without passing through the graphical user interface of SPMF.

There is an example of how to do this, in the SPMF source code. It is the file MainTestExperimenterForParameterChange, which is located in the package ca.pfv.spmf.experiments.oneparametervaried of SPMF.

The main code from that example is shown below. In that example, we will run a performance comparison of two algorithms from SPMF called "FPGrowth_association_rules" and "Apriori_association_rules. The input file is the file "ContextPasquier99.txt", which is located in the same folder, and results of the experiments will be saved in a new folder called "EXPERIMENTS". To run the experiments, we indicate a path to the file spmf.jar on the local computer. This is because we will be calling the spmf.jar file to run the algorithms.. In this example, we assume that spmf.jar has been downloaded to the computer and saved in C://Users//Philippe/Desktop/spmf.jar but other location could work as well.
We will also indicate that there are two parameters for those algorithms and that the first one will be varied using the special code : ## and that the second parameter will be fixed to 0.6. Then we will indicate that the first parameter will be varied using the values 0.01, 0.02, 0.03, 0.04, and 0.05. We will also indicate that each algorithm cannot run for more than 100000 s. We can also set additional options such as to generate PGFPlots figures to be used with Latex.

The important part of the code for this example is below:

// The input file String inputFile = fileToPath("contextPasquier99.txt");

// The directory where results from the experiments will be stored.
String outputDirectory = "EXPERIMENTS";

 // Create the tool to run experiment
   ExperimenterForParameterChange experimenter = new ExperimenterForParameterChange();
   
   // Here we must give a path to the spmf.jar file on your computer
   //  If you did not download spmf.jar, you need to download it
   // from the download page on the SPMF website
   experimenter.setSPMFJarFilePath("C:\\\\Users\\\\philippe\\\\Desktop\\\\spmf.jar");
   
   // We indicate that if an algorithm run out of time, 
   // the result should be indicated using the "-" string
   experimenter.setTimeoutCodeS("-");
 // A list of algorithm to be compared.
   // They must have the same parameters
   String algorithmNames[] = new String[] { "FPGrowth_association_rules", "Apriori_association_rules" };
 // The default list of parameter values
   // The code ## should be used to indicate the parameter that will be varied
   // in the experiment.
   // Below, I have indicated that I want to vary the first parameter.
   String params[] = new String[] { "##", "0.6" };
   
   // The values for the parameter that is varied
   // Below, it means that we will vary the first parameter using the 
   // following values
   String varyingParameterValues[] = new String[] { "0.01", "0.02", "0.03", "0.04", "0.05" };
   
   // The name of the varied parameter (just for  printing purpose
   String variedParameterName = "minsup";
   
   // The maximum time to run each algorithm
   // If an algorithm exceed that time limit, it will be stopped.
   int timeoutInMilliseconds = 100000;
   
   // If true, the output file sizes will be compared in the results
   boolean compareOutputSizes = true;
   
   // If true, the command for running each algorithm will be shown in
   // the console. This is useful for debugging but not very important.
   boolean showCommand = false;
   
   // If true, latex figures will be generated
   boolean latexFigures = true;
 // Run the experiments.
   experimenter.runAnAlgorithmAndVaryParameter(algorithmNames, params, varyingParameterValues, inputFile,
   outputDirectory, timeoutInMilliseconds, compareOutputSizes, showCommand, latexFigures,variedParameterName);

If we run this example, the output in the console will be like this:

********************************************
***** RUNNING EXPERIMENTS *****
********************************************
INPUT: /D:/workspace/SPMF_2019_for_release/bin/ca/pfv/spmf/experiments/oneparametervaried/contextPasquier99.txt
OUTPUT DIRECTORY: EXPERIMENTS

***** EXPERIMENT 1
ALGORITHM: FPGrowth_association_rules minsup= 0,01 *****
TIME: 0,29 s MEMORY: 7,77 MB
OUTPUT: FPGrowth_association_rules_0.01.txt OUTPUT_SIZE: 42 lines

***** EXPERIMENT 2
ALGORITHM: FPGrowth_association_rules minsup= 0,02 *****
TIME: 0,26 s MEMORY: 7,77 MB
OUTPUT: FPGrowth_association_rules_0.02.txt OUTPUT_SIZE: 42 lines

***** EXPERIMENT 3
ALGORITHM: FPGrowth_association_rules minsup= 0,03 *****
TIME: 0,24 s MEMORY: 7,77 MB
OUTPUT: FPGrowth_association_rules_0.03.txt OUTPUT_SIZE: 42 lines

***** EXPERIMENT 4
ALGORITHM: FPGrowth_association_rules minsup= 0,04 *****
TIME: 0,27 s MEMORY: 7,77 MB
OUTPUT: FPGrowth_association_rules_0.04.txt OUTPUT_SIZE: 42 lines

***** EXPERIMENT 5
ALGORITHM: FPGrowth_association_rules minsup= 0,05 *****
TIME: 0,24 s MEMORY: 7,77 MB
OUTPUT: FPGrowth_association_rules_0.05.txt OUTPUT_SIZE: 42 lines

***** EXPERIMENT 6
ALGORITHM: Apriori_association_rules minsup= 0,01 *****
TIME: 0,26 s MEMORY: 7,76 MB
OUTPUT: Apriori_association_rules_0.01.txt OUTPUT_SIZE: 42 lines

***** EXPERIMENT 7
ALGORITHM: Apriori_association_rules minsup= 0,02 *****
TIME: 0,27 s MEMORY: 7,76 MB
OUTPUT: Apriori_association_rules_0.02.txt OUTPUT_SIZE: 42 lines

***** EXPERIMENT 8
ALGORITHM: Apriori_association_rules minsup= 0,03 *****
TIME: 0,28 s MEMORY: 7,76 MB
OUTPUT: Apriori_association_rules_0.03.txt OUTPUT_SIZE: 42 lines

***** EXPERIMENT 9
ALGORITHM: Apriori_association_rules minsup= 0,04 *****
TIME: 0,3 s MEMORY: 7,76 MB
OUTPUT: Apriori_association_rules_0.04.txt OUTPUT_SIZE: 42 lines

***** EXPERIMENT 10
ALGORITHM: Apriori_association_rules minsup= 0,05 *****
TIME: 0,32 s MEMORY: 7,76 MB
OUTPUT: Apriori_association_rules_0.05.txt OUTPUT_SIZE: 42 lines

********************************************
***** RESULTS *****
********************************************
INPUT FILE: /D:/workspace/SPMF_2019_for_release/bin/ca/pfv/spmf/experiments/oneparametervaried/contextPasquier99.txt
PARAMETERS: [##, 0.6]

TIME (S)
minsup 0,01 0,02 0,03 0,04 0,05
FPGrowth_association_rules 0,29 0,26 0,24 0,27 0,24
Apriori_association_rules 0,26 0,27 0,28 0,3 0,32

MEMORY (MB)
minsup 0,01 0,02 0,03 0,04 0,05
FPGrowth_association_rules 7,77 7,77 7,77 7,77 7,77
Apriori_association_rules 7,76 7,76 7,76 7,76 7,76

OUTPUT_SIZE (LINES)
minsup 0,01 0,02 0,03 0,04 0,05
FPGrowth_association_rules 42 42 42 42 42
Apriori_association_rules 42 42 42 42 42

If we want we can copy the above results in green color to Excel for generating charts from these results.

Besides, in the EXPERIMENTS/ directory that will have been created, we can find the detailed results in files, including figures to be used in Latex.

Now, having said that an interesting question is : What if I want to test some algorithms that are not in SPMF.jar ? At this moment, the tool is only designed to compare algorithms from SPMF.jar but it is not difficult to add new algorithms to SPMF.Jar. I will try to add more explanations about this in the future.