Generating a Synthetic Sequence Database with Timestamps (SPMF documentation)

This example explains how to generate a synthetic sequence database with timestamps using the SPMF open-source data mining library.

How to run this example?

What is this tool?

This tool is a random generator of sequence databases with timestamps. It can be used to generate synthetic sequence databases with timestamps to compare the performance of data mining algorithms that takes a sequence database with timestamps as input.

Synthetic databases are often used in the data mining litterature to evaluate algorithms. In particular, they are useful for comparing the scalability of algorithms. For example, one can generate sequence databases having various size and see how the algorithms react in terms of execution time and memory usage with respect to the database size.

What is the input?

The tool for generating a sequence databases with timestamps takes four prameters as input:

1) the number of sequences to be generated (an integer >= 1)

2) the maximum number of distinct item that the database should contain (an integer >= 1),

3) the number of items that each itemset should contain (an integer >= 1)

4) the number of itemsets that each sequence should contain (an integer >= 1)

What is the output?

The algorithm outputs a sequence database with timestamps respecting these parameters. The database is generated by using a random number generator.