Introduction | Source Code | License
SeqDBGen is a simple Java program designed for generating sequence databases. Generating sequence databases is useful for testing sequential pattern mining algorithms.
SeqDBGen generates datasets randomly according to four user-specified parameters : (1) the number of sequences, (2) the maximum number of different items in the sequence database, (3) the number of items contained in each itemset and (4) the number of itemsets by sequencce.
For example, if you run SeqDBGen with the parameters 3, 10, 3, 5, you could obtain the following sequence database containing 3 sequences of 5 itemsets:
<0> 5 0 8 -1 <1> 3 7 1 -1 <2> 4 9 5 -1 <3> 5 8 2 -1 <4> 1 5 7 -1 -2
<0> 0 5 4 -1 <1> 5 7 4 -1 <2> 4 7 9 -1 <3> 5 4 2 -1 <4> 6 8 0 -1 -2
<0> 7 1 3 -1 <1> 9 1 4 -1 <2> 2 7 5 -1 <3> 1 2 5 -1 <4> 2 4 0 -1 -2
The format for sequence database is the following:
The Java source code is provided below:
package ca.pfv.seqDBGen;import java.util.Random;
public class GenerateSequenceDatabase { |
The SeqDBGen software is free to use. If you find it useful, you can link to this webpage or mention it in your publications.
Copyright © 2008-2009 Philippe Fournier-Viger. All rights reserved.