Mining Multiple Fuzzy Frequent Itemsets in a quantitatve transaction database using the MFFI-Miner algorithm (SPMF documentation)

This example explains how to run the MFFI-Miner algorithm using the SPMF open-source data mining library.

How to run this example?

What is MFFI-Miner?

MFFI-Miner is an algorithm for mining fuzzy frequent itemsets in quantitative transactional database. In simple words, a quantitative transactional database is a database where items have quantities.

 What is the input?

MFFI-Miner takes as input (1) a transaction database with quantity information and a minimum support threshold minSupport (a positive integer). Let's consider the following database consisting of 8 transactions (t1, t2, ..., t8) and 5 items (1, 2, 3, 4, 5). This database is provided in the text file “contextMFFIMiner.txt ” in the package ca.pfv.spmf.tests of the SPMF distribution.

Moreover, consider the membership function, shown below, which defines three ranges (low, medium, high).

Transaction ID

Items

Quantitates

t1

3 4 5

3 2 1

t2

2 3 4

1 2 1

t3

2 3 5

3 3 1

t4

1 3 4

3 5 3

t5

1 2 3 4

1 1 2 1

t6

2 4 5

1 1 2

t7

1 2 4 5

4 3 5 3

t8

2 3 4

1 2 1

Fuzzy membership function

Why MFFI-Miner is useful?

Previous work on fuzzy frequent itemset mining used the maximum scalar cardinality to mine fuzzy frequent itemsets (FFIs), in which at most, only one linguistic term was used to represent the item in the databases. Although this process can reduce the amount of computation for mining FFIs, the discovered information may be invalid or incomplete. A gradual data-reduction approach (GDF) for mining multiple fuzzy frequent itemsets (MFFIs).The tree-based algorithm UBMFFP-tree suffered from building a huge tree structure. It mines FFIs with multiple fuzzy regions based on an Apriori-like mechanism. The MFFI-Miner algorithm efficiently mines MFFIs without candidate generation based on the designed fuzzy-list structure. This approach can be used to reduce the amount of computation and avoid using a generate-candidate-and-test approach with a level-wise exploration of the search space

Input file format

The input file format of MFFI-Miner is defined as follows. It is a text file. Each lines represents a transaction. Each line is composed of three sections, as follows.

For example, for the previous example, the input file is defined as follows:

3 4 5:6:3 2 1
2 3 4:4:1 2 1
2 3 5:7:3 3 1
1 3 4:11:3 5 3
1 2 3 4:5:1 1 2 1
2 4 5:4:1 1 2
1 2 4 5:15:4 3 5 3
2 3 4:4:1 2 1

Consider the first line. It means that the transaction {3, 4, 5} has a total quantity of 6 and that items 3, 4 and 5 respectively have a quantity of 3, 2 and 1 in this transaction. The following lines follow the same format.

Output file format

The output file format of MFFI-Miner is defined as follows. It is a text file, where each line represents a frequent itemset. On each line, the items of the itemset are first listed. Each item is represented by an integer, followed by the letter "L", "M" or "H", and a single space. The letters L, M and H indicate if an item is in the Low, Medium or High Range of the fuzzy membership function, respectively. After, all the items, the keyword "#SUP:" appears, which is followed by a integer value indicating the support of that itemset.
3.H #SUP: 2.0
2.L #SUP: 4.0
2.L 3.M #SUP: 2.0
2.L 3.M 4.L #SUP: 2.0
2.L 4.L #SUP: 4.0
3.M #SUP: 3.3333335
3.M 4.L #SUP: 2.5000002
5.L #SUP: 2.5
4.L #SUP: 4.5

For example, the first line indicates that the itemset 3 in the high range of the fuzzy membership function (H) has a fuzzy value of 2.0. The other lines follows the same format.

Performance

MFFI-Miner is a very efficient algorithm. It uses a designed fuzzy-list structure to identify unpromising candidates early, and thus speed up the discovery of fuzzy itemsets.

Where can I get more information about the algorithm?

This is the article describing the MFFI-Miner algorithm:

Jerry Chun-Wei Lin, Ting Li, Philippe Fournier-Viger, Tzung-Pei Hong, Jimmy     Ming-Thai Wu, and Justin Zhan. Efficient Mining of Multiple Fuzzy Frequent Itemsets[J]. International Journal of Fuzzy Systems, 2016:1-9.

Some other related papers:

T. P. Hong, G. C. Lan, Y. H. Lin, and S. T. Pan, An effective gradual data-reduction strategy for fuzzy itemset mining, International Journal of Fuzzy Systems, Vol. 15(2), pp.170-181, 2013. (GDF)

J. C. W. Lin, T. P. Hong, T. C. Lin, and S. T. Pan, An UBMFFP tree for mining multiple fuzzy frequent itemsets, International Journal of Uncertainty, Fuzziness and Knowledge- Based Systems, Vol. 23(6), pp. 861-879, 2015. (UBMFFP-tree)

For a good overview of frequent itemset mining algorithms, you may read this survey paper.