SPMF documentation > Comparing Several Sequence Prediction Models

This example explains how to compare sequence prediction models using the SPMF open-source data mining library.

How to run this example?

This example illustrates how to automatically compare the accuracy, coverage, training time and prediction time of various sequence prediction models, on several datasets. This capability was used for example to generate the experimental results shown in the CPT+ paper (Gueniche et al., 2015).

To understand this example, you should open the file "MainTestCompareSequencePredictionModels.java" in the package "ca.pfv.SPMF.tests".

The first line creates an instance of the Evaluator class to automatically compares several sequence prediction models. It takes as parameter a file path where datasets should be stored. For example, in this example, it is assumes that datasets are located in the folder "/home/ted/java/IPredict/datasets" on the local computer. Note that the datasets are not included in the source code of SPMF due to the large size of some datasets. But they can be downloaded from the dataset page on the SPMF website.

Evaluator evaluator = new Evaluator("/home/ted/java/IPredict/datasets");

The next lines indicates which datasets should be used for the experiments. For example, the following lines indicates to load the BMS.dat dataset and SIGN.dat datasets, and to respectively use the first 5000 and 1000 lines of these datasets.

evaluator.addDataset("BMS", 5000);
evaluator.addDataset("SIGN", 1000);
...

The next lines specify which sequence prediction models should be compared and their parameters. For example, the following lines indicates to compare DG, TDAG and CPT+. Moreover, the look-ahead parameter of DG is set to 4 and the parameters CCF and CBS of CPT+ are set to true.

evaluator.addPredictor(new DGPredictor("DG", "lookahead:4"));
evaluator.addPredictor(new TDAGPredictor());
evaluator.addPredictor(new CPTPlusPredictor("CPT+", "CCF:true CBS:true"));
...

Then, the next line indicates to run the experiment with a k-fold cross-validation of k = 14, and to print the results, dataset statistics, and execution statistics.

//Start the experiment
StatsLogger results = evaluator.Start(Evaluator.KFOLD, 14 , true, true, true);

When this example is run, it will show a comparison of the performance of the various sequence prediction models.

<< Return to table of contents of SPMF documentation

Copyright © 2008-2020 Philippe Fournier-Viger. All rights reserved.