Generating a Synthetic Sequence Database (SPMF documentation)

This example explains how to generate a synthetic sequence database using the SPMF open-source data mining library.

How to run this example?

What is this tool?

This tool is a random generator of sequence databases. It can be used to generate synthetic sequence databases to compare the performance of data mining algorithms that takes a sequence database as input.

Synthetic databases are often used in the data mining litterature to evaluate algorithms. In particular, they are useful for comparing the scalability of algorithms. For example, one can generate sequence databases having various size and see how the algorithms react in terms of execution time and memory usage with respect to the database size.

What is the input?

The tool for generating a sequence databases takes four prameters as input:

1) the number of sequences to be generated (an integer >= 1)

2) the maximum number of distinct item that the database should contain (an integer >= 1),

3) the number of items that each itemset should contain (an integer >= 1)

4) the number of itemsets that each sequence should contain (an integer >= 1)

What is the output?

The algorithm outputs a sequence database respecting these parameters. The database is generated by using a random number generator.