SeqDBGen: a Sequence Database Generator

Introduction | Source Code | License

Introduction

SeqDBGen is a simple Java program designed for generating sequence databases. Generating sequence databases is useful for testing sequential pattern mining algorithms.

SeqDBGen generates datasets randomly according to four user-specified parameters : (1) the number of sequences, (2) the maximum number of different items in the sequence database, (3) the number of items contained in each itemset and (4) the number of itemsets by sequencce.

For example, if you run SeqDBGen with the parameters 3, 10, 3, 5, you could obtain the following sequence database containing 3 sequences of 5 itemsets:

<0> 5 0 8 -1 <1> 3 7 1 -1 <2> 4 9 5 -1 <3> 5 8 2 -1 <4> 1 5 7 -1 -2
<0> 0 5 4 -1 <1> 5 7 4 -1 <2> 4 7 9 -1 <3> 5 4 2 -1 <4> 6 8 0 -1 -2
<0> 7 1 3 -1 <1> 9 1 4 -1 <2> 2 7 5 -1 <3> 1 2 5 -1 <4> 2 4 0 -1 -2

The format for sequence database is the following:

Source Code

The Java source code is provided below:

package ca.pfv.seqDBGen;import java.util.Random;
     
public class GenerateSequenceDatabase {

private static Random random = new Random(System.currentTimeMillis());

public static void main(String [] arg){
// Generate sequence database
generateDatabase(3, 10, 3, 5);
}

/**
* @param sSize : sequence count
* @param iSize : maximum number of different items
* @param xSize : number of items by itemset
* @param nbItemsets : number of itemsets by sequences
*/
private static void generateDatabase(int sSize, int iSize, int xSize, int nbItemsets) {
// Generate some random sequences
for(int i=0; i<sSize; i++){
for(int j=0; j<nbItemsets; j++){
System.out.print("<" + j + "> ");
for(int k=0; k <xSize; k++){
int item = random.nextInt(iSize);
System.out.print(item + " ");
}
System.out.print("-1 ");
}
System.out.println("-2 ");
}
}
}

License

The SeqDBGen software is free to use. If you find it useful, you can link to this webpage or mention it in your publications.

Copyright © 2008-2009 Philippe Fournier-Viger. All rights reserved.