Resize a database in SPMF format (a text file) (SPMF documentation)

This example explains how to resize a database in SPMF format (a text file) using the SPMF open-source data mining library.

How to run this example?

What is this tool?

This tool is a small program that is designed to resize a database by using X% of the transactions of an original database. The tool takes as input an original database, and a percentage X. Then it outputs a new file containing X% of the lines of data from the original database.For example, if a database contains 100,000 transactions and this tool is used with a percentage of 75 %, the output will be a database containing the 75,000 first transactions from the original database. This program is designed to work with any database file in SPMF format (text file). This tool is useful for performing scalability experiments when comparing algorithms. For example, one may wants to see the behavior of some algoritms when using 25%, 50%, 75% and 100% of the database.

What is the input?

The input is a text file in SPMF format. It could be for example a transaction database, a sequence database, or other types of databases used by algorithms offered in SPMF. Moreover the user has to specify a percentage X.

What is the output?

The output is a new file containing X% of the lines of data from the input file.

Example

For example, if the user applies the tool for resizing a database with X = 70 % on the following file DB_UtilityPerHUIs.txt in this example:

3 1:6:1 5
5:3:3
3 5 1 2 4:25:1 3 5 10 6
3 5 2 4:20:3 3 8 6
3 1 4:8:1 5 2
3 5 1:22:6 6 10
3 5 2:9:2 3 4

The output is a new file (output.txt in this example) containing 5 transactions (because 70 % of 7 transactions is 5 transactions):

3 1:6:1 5
5:3:3
3 5 1 2 4:25:1 3 5 10 6
3 5 2 4:20:3 3 8 6
3 1 4:8:1 5 2