Converting a Sequence Database to SPMF Format (SPMF documentation)

This example explains how to convert a sequence database to SPMF format using the SPMF open-source data mining library.

How to run this example?

The tool for converting a sequence databases to SPMF format takes three prameters as input:

The algorithm outputs a sequence database in SPMF format.

The CSV_INTEGER format is defined as follows:

For example, the follwing sequence databasee is in CSV_INTEGER format and contains four sequences:

1,2,3,4
5,6,7,8
5,6,7
1,2,3

The Kosarak format is defined as follows:

For example, the follwing sequence databasee is in Kosarak format and contains four sequences:

1 2 3 4
5 6 7 8
5 6 7
1 2 3

The IBMGenerator format is the format used by the IBM Data Quest Generator. The format is defined as follows:

For example, the follwing sequence databasee is in Kosarak format and contains four sequences:

1 -1 2 -1 3 -1 4 -1 -2
5 -1 6 -1 7 -1 8 -1 -2
5 -1 6 -1 7 -1 -2
1 -1 2 -1 3 -1 -2

The Snake format is defined as follows:

For example, the follwing sequence databasee is in Snake format and contains four sequences:

ABCD
ABAB
CACD
ADAC

The BMS format is defined as follows:

For example, the follwing sequence databasee is in BMS format and contains four sequences with the ids 10, 20, 30 and 40, respectively:

10 1
10 2
10 3
10 4
20 5
20 6
20 7
20 8
30 5
30 6
30 7
40 1
40 2
40 3