Developers' guide - SPMF 0.92c (old version)

This webpage provide information about how the source code is organized to make it easier to understand the source code, modify it and reuse it in other projects.

Compiling the source code

The source code of SPMF is designed to be easily reusable. To compile it, the only requirement is a recent version of the Java JDK (>=1.7).

To compile the source code, it is more convenient to use an integrated development environment because there are many classes and packages. I recommend to use an IDE such as Eclipse or NetBeans for compiling and modifying the source code. The instructions for installing the source code of SPMF in Eclipse, Netbeans or IntelliJ and compiling it can be found here: how_to_install.txt

Source code organization - packages

The source code is organized into a package hierarchy (a package is a folder). Here is a description of the main packages in SPMF.

Source code organization - algorithms

In general, each algorithm has its own package containing its files (except for some algorithms that share files with other algorithms). For example, the SPAM algorithm is located in the package ca/pfv/spmf/algorithms/sequentialpatterns/spam/, which is a subpackage of ca/pfv/spmf/algorithms/sequentialpatterns/ because SPAM is a sequential pattern mining algorithm.

Each algorithm has a main class with a name starting with "Algo..." and a method "runAlgorithm()" for running the algorithm. For example, the main file for the SPAM algorithm is AlgoSPAM.java and it has a runAlgorithm() method to run the algorithm. This method takes three parameters as input : (1) an input file, (2) an output file and (3) a minsup threshold.

To see an example of how to run the SPAM algorithm from the source code, we should go in the ca/pfv/spmf/test/ folder where all the example files for developers are located. In this folder, all algorithms have a file named "MainTestXXXX.java" where XXXX is the name of the algorithm. For example, the file showing how to execute the SPAM algorithm is "MainTestSPAM_saveToFile.java". If we open the file, we see the following code:

...

// Load a sequence database
String input = fileToPath("contextPrefixSpan.txt");
String output = "C://patterns//sequential_patterns_SPAM.txt";

// Create an instance of the algorithm
AlgoSPAM algo = new AlgoSPAM();

// execute the algorithm with minsup = 2 sequences (50 %)
algo.runAlgorithm(input, output, 0.5);
algo.printStatistics();

...

This example correspond to the example for the SPAM algorithm in the documentation section of the website (here). The input file is set to "contextPrefixSpan.txt", which correspond to the example in the documentation on the website. This file can be found in ca/pfv/spmf/test/ .

For the output path, the output file path is "C://patterns//sequential_patterns_SPAM.txt". This line can be replaced by whatever you like. You should chose a path that exist on your computer.

Finally, the line "runAlgorithm()" launch the algorithm. In this example, we can see that the parameter 0.5 is used. To see the meaning of that parameter for the SPAM algorithm, we should look again at the documentation on the website (here).

Copying the source code of an algorithm in another Java project

Reusing the source code of a single algorithm from SPMF in another Java project is very easy, since in general algorithms in SPMF are separated by packages, except for some datastructures and classes that are shared by a few algorithms.

If you want to reuse a single algorithm from SPMF without copying all the code from SPMF, you should first determine which classes are necessary. For example, if you want to copy the source code of the SPAM algorithm in another project, then you would need to first locate the package for the SPAM algorithm. It is ca/pfv/spmf/algorithms/sequentialpatterns/spam/ . Then you should check the first lines of each Java file to see if there is some additional files that are required by looking at the import statements of each file. In the case of SPAM, there is no additional files that are required. Therefore, we only need the files located in the package ca/pfv/spmf/sequentialpatterns/spam/. In particular, we only need: AlgoSPAM. java, Bitmap.Java, Itemset.java and Prefix.Java.

Note that each Java file has a package statement in its first line. If you move the files to another package, you should change the package statements accordingly.

Also, note that the source code of SPMF is distributed under the GPL license. If you include the code in another project, it should comply with the GPL license (see license). Also please note that you should not delete the copyright statement on top of each file.

Naming conventions

In SPMF, the source code generally follows the Standard Code Conventions for the Java Programming Language so that it can be easy to understood by Java programmers. Also, the Javadoc convention is used for documenting the code.

How to generate the jar file for the GUI version of SPMF

If you want to regenerate the jar file for the GUI version of SPMF, you can do as follows in Eclipse:

  1. Before starting, make sure that you run "Main.Java" which is in the package ca.pfv.spmf.gui at least once. This will create a launch configuration in Eclipse for Main.Java that you will need later.
  2. Right-click on your project containing the source code of SPMF.
  3. Select "Export", select "Runnable JAR file" and click "Next"
  4. Under "Launch configuration" select the class that should be launched by the JAR file, which is Main.java.
    Indicate where you want to export the Jar files in your computer by clicking "Browse..."
    Click "Finish".

If you have performed these steps correctly, the Jar file should have been written in the location that you chose in Step 4.

Running the GUI from the source code

To run the graphical user interface from the source code, just run the file "MainWindow.java" in the package ca/pfv/spmf/gui/.

Other questions

If you have other questions about the source code, don't hesitate to write your questions in the forum or to contact me directly if your question has to be confidential.