Comparing Several Sequence Prediction Models (SPMF documentation)

This example explains how to compare sequence prediction models using the SPMF open-source data mining library.

How to run this example?

This example illustrates how to automatically compare the accuracy, coverage, training time and prediction time of various sequence prediction models, on several datasets. This capability was used for example to generate the experimental results shown in the CPT+ paper (Gueniche et al., 2015).

To understand this example, you should open the file "MainTestCompareSequencePredictionModels.java" in the package "ca.pfv.SPMF.tests".

The first line creates an instance of the Evaluator class to automatically compare several sequence prediction models. It takes as parameter a file path where a location on your computer where datasets should be stored. For example, lets assumes that some datasets are located in a folder on the hard drive of the local computercalled "/home/ted/java/IPredict/datasets". Note that datasets are not included in the source code of SPMF due to the large size of some datasets. But they can be downloaded from the dataset page on the SPMF website. To run this example, create a folder, and download the datasets to this folder. Then in the following line, replace "/home/ted/java/IPredict/datasets" by your folder:

Evaluator evaluator = new Evaluator("/home/ted/java/IPredict/datasets");

The next lines indicates which datasets should be used for the experiments. For example, the following lines indicates to load the BMS.dat dataset and SIGN.dat datasets, and to respectively use the first 5000 and 1000 lines of these datasets.

evaluator.addDataset("BMS", 5000);
evaluator.addDataset("SIGN", 1000);
...

The next lines specify which sequence prediction models should be compared and their parameters. For example, the following lines indicates to compare DG, TDAG and CPT+. Moreover, the look-ahead parameter of DG is set to 4 and the parameters CCF and CBS of CPT+ are set to true.

evaluator.addPredictor(new DGPredictor("DG", "lookahead:4"));
evaluator.addPredictor(new TDAGPredictor());
evaluator.addPredictor(new CPTPlusPredictor("CPT+", "CCF:true CBS:true"));
...

Then, the next line indicates to run the experiment with a k-fold cross-validation of k = 14, and to print the results, dataset statistics, and execution statistics.

//Start the experiment
StatsLogger results = evaluator.Start(Evaluator.KFOLD, 14 , true, true, true);

When this example is run, it will show a comparison of the performance of the various sequence prediction models.