What is the Workflow Editor?

The SPMF Workflow Editor is a graphical tool for creating, editing, validating, and executing workflows of data mining algorithms. A workflow is a set of algorithms where the output of one algorithm may become the input of subsequent algorithms. This allows you to chain multiple algorithms together to perform complex data mining tasks automatically.

How to run this example?

If you are using the graphical interface of SPMF, (1) choose the "SPMF_workflow_editor" algorithm, (2) click "Run algorithm".

This will open the SPMF workflow editor, which is displayed below:

workflow editor of SPMF

The workflow editor is a window divided in the five areas:

How to create a workflow?

Creating a workflow can be done by following these steps:

Step 1: Click the Add algorithm button to add the first algorithm node to the workflow.

add algorithm button

This will create a new node:

add algorithm node

Step 2: Select the algorithm node by clicking on it. The information panel on the right will display the properties of the selected node.

information panel for algorithm in workflow editor

Step 3: In the information panel, configure the algorithm node by selecting an algorithm.

First, click the Select button next to "Algorithm:" to choose an algorithm from the algorithm browser. In this example, we will choose the Apriori algorithm, and then click the OK button.

workflow editor select and algorithm

Note that it is alternatively possible to click the Recent ▼button to select an algorithm that has been used recently.

In this example, after selecting the Apriori algorithm, the workflow view is updated on the left to show that this algorithm has an input and that it will produce and output. Moreover, on the right, the information panel is updated to show that this algorithm has two parameters called the minimum support (Minsup) and the maximum pattern length (optional)

apriori node in workflow editor from spmf

In this example, we will set the Minsup parameter to 0.4 by entering the value in the parameter table and then pressing enter:

workflow parameter for apriori

Then, we can select the input file for this algorithm. This is done by a) clicking on the green input node in the workflow, and then clicking on the "Select" button to select an input file. Alternatively, the "Recent" button can be used to select a recent file.

configure input file in spmf workflow editor

In this example, we will choose the file contextPasquier99txt, which is provided with SPMF.

selecting the input file in the workflow editor

Then, we can set the output file for this algorithm. This is done by a) clicking on the green output node in the workflow, and then clicking on the "Select" button to set where to write the output file produced by the algorithm.. Alternatively, the "Recent" button can be used to select a recently used file location.

choose output of apriori in workflow editor of spmf

In this example we will set the name of the output file to "frequent_itemsets.txt".

frequent itemsets output in the workflow editor of spmf

After completing this step, we have a complete workflow with a single algorithm.

We can click on the Validate the workflow button to verify that we have made no mistake such as forgetting to set a mandatory parameter.

validate the workflow

We can also click on "Run the workflow " to execute the workflow.

workflow editor run the workflow

This will run the algorithm. Information about the algorithm execution will be displayed in the console of the Workflow editor, and the output will be saved to the the file"frequent_itemsets.txt".

workflow editor console panel

This is interesting, but we may prefer to view the result using tools for viewing frequent itemsets offered in SPMF. To do this, we will add another algorithm to the workflow.

We will click on the green output node in the workflow editor, and click "Add algorithm".

add another algorithm to workflow editor

This will add another algorithm:

adding a second algorithm in the workflow editor

Then, we will click on the "Select" button in the right panel to choose the Visualize_frequent_itemsets algorithm. In the algorithm browser, a) we can use the search bar to quickly find this algorithm, then b) we click on it, and then c) we click the "OK" button:

workflow editor select tool to visualize frequent itemsets

Now the workflow will appear like this:

workflow editor after adding an algorithm for viewing frequent itemsets

We can then click on "Run the workflow " to execute the workflow.

workflow editor run the workflow

This will execute the Apriori algorithm and then open the tool for visualizing frequent itemsets.

viewing the frequent itemsets

Now, if we are satisfied with this workflow, we can save the workflow to a file using the save workflow menu:

save workflow menu

This will allow to open this workflow again from the Workflow editor to modify it. We can also use the Workflow menu to export the workflow as a script (a BAT file on windows or SH file for Linux).

How to create more complex workflows?

Since SPMF 2.66, is now possible to create workflows with branches. To do this, we can create on an file in the worklow view and click the "Add algorithm" button.
For example, continuing the previous example, we can a) click on the "frequent_itemsets.txt" node and then click b) the button "Add algorithm":

add a third algorithm to the workflow editor

This will create a new algorithm node:

workflow editor after adding the third algorithm

We can now click on the "Select" button to open the algorithm browser and select an algorithm.
For example, a) we click on "Select", b) choose the "Open_text_file_with_SPMF_text_editor" algo

Therithm and c) click "OK":

add a text editor to the worfklow editor to view the output

The result is then like this:

adding a third algorithm

We could then run the workflow by clickling the "Run the workflow" button. The Apriori will then be executed and then the Visualized_frequent_itemsets tools will be opened, and then the SPMF text editor will be open as well to view the result.

Now, lets say that we want to add another algorithm that is run separately from Apriori in a different sequence. To to this, a) we will click on the topmost node of the worklow and then click b) add algorithm.

another branch in workflow editor

The result will be like this:

workflow editor add a branch

After that we can continue adding algorithms in the same way to build a larger workflow. For example, after a few steps, it may look like this:

workflow editor complex workflow

Other features

Besides the features explained above, is also possible to click on algorithm nodes in the worfklow view to remove algorithms using the "Remove algorithm" button.

It is also possible to right-click on nodes from the workflow editor to remove or duplicate algorithms. Here is a picture:

workflow editor popup menu

What is the workflow file format?

Workflow files use a simple line-oriented plain text format. Each line contains a keyword followed by space-separated key=value pairs.

Every workflow file must begin with this header line:

@FILETYPE="WORKFLOW"

After the header, the file contains four types of record lines:

NODE record

A NODE record defines one algorithm step in the workflow:

NODE id=<integer> parentId=<integer> algorithm=<name> showInput=<boolean> showOutput=<boolean>

INPUT record

An INPUT record defines the input file for a root node. This record is only present when showInput=true:

INPUT id=<integer> file=<path> name=<display_name>

OUTPUT record

An OUTPUT record defines the output file for an algorithm node:

OUTPUT id=<integer> file=<path> name=<display_name>

PARAM record

A PARAM record defines one parameter value for an algorithm. There is one PARAM record for each non-null parameter:

PARAM id=<integer> index=<integer> value=<string>

Escaping rules

Since the format uses whitespace to separate fields, any characters that would break token splitting are percent-encoded:

Example: The Windows file path C:\Users\Phil\Desktop\DATASETS\chess.txt is stored verbatim because backslashes do not need encoding:

C:\Users\Phil\Desktop\DATASETS\chess.txt

A path or value that contains spaces, such as C:\My Documents\chess.txt, is stored as:

C:\My%20Documents\chess.txt

When the workflow is loaded, all percent-escape sequences are automatically decoded back to the original characters.

An example of workflow file

This example shows a workflow file that runs two different frequent itemset mining algorithms (Eclat and Carpenter) on the same input file, then visualizes each result. The Eclat output is also opened in a text editor.

Workflow file content (workflow_eclat.txt)

@FILETYPE="WORKFLOW"
NODE id=1 parentId=0 algorithm=Eclat showInput=true showOutput=true
INPUT id=1 file=C:%5CUsers%5CPhil%5CDesktop%5CDATASETS%5Cchess.txt name=chess.txt
OUTPUT id=1 file=Output1.txt name=Output1.txt
PARAM id=1 index=0 value=0.8
NODE id=2 parentId=1 algorithm=Visualize_Frequent_itemsets showInput=false showOutput=false
OUTPUT id=2 file=Output2.txt name=Output2.txt
NODE id=3 parentId=1 algorithm=Open_text_file_with_SPMF_text_editor showInput=false showOutput=false
OUTPUT id=3 file=Output3.txt name=Output3.txt
NODE id=4 parentId=0 algorithm=Carpenter showInput=true showOutput=true
INPUT id=4 file=C:%5CUsers%5CPhil%5CDesktop%5CDATASETS%5Cchess.txt name=chess.txt
OUTPUT id=4 file=Output4.txt name=Output4.txt
PARAM id=4 index=0 value=0.7
NODE id=5 parentId=4 algorithm=Visualize_Frequent_closed_itemsets showInput=false showOutput=false
OUTPUT id=5 file=Output5.txt name=Output5.txt

This represents this workflow:

workflow eclat
which has the following structure:

How to export workflows as scripts?

Workflows can be exported as executable shell scripts for batch processing on the command line:

These scripts invoke java -jar spmf.jar run <algorithm> <input> <output> <parameters> for each algorithm in the workflow in breadth-first order.

Note: Script export does not preserve parallelism. All algorithms are executed sequentially in the order they appear in the script. If you need parallel execution, use the Workflow Editor's Run command instead.

Where can I get more information?

For information about specific algorithms that can be used in workflows, see the SPMF documentation.

For general questions about SPMF, visit the SPMF website.