This example explains how to run the **CloStream algorithm** using the SPMF open-source data mining library.

How to run this example?

**This example is not available in the release version of SPMF.****To run this example with the source code version of SPMF,**launch the file**"MainTestCloStream.java"**in the package**ca.pfv.SPMF.tests**.

What is CloStream?

CloStreamis an algorithm forincrementally miningclosed itemsetsfrom a data stream. It was proposed by Yen et al. (2009).Why is it useful? Because most closed itemset mining algorithms such as Charm, DCI_Closed and AprioriClose are batch algorithms. This means that if the transaction database is updated, we need to run the algorithms again to update the set of closed itemsets. If there is constant insertion of new transactions and the results need to be updated often, it may become very costly to use these algorithms. A

stream mining algorithmlikeCloStreamis specially designed to handle this situation. It assumes that each transaction in a database can only be read once and that new transaction appears regularly. Every time that a new transaction appear, the result is updated by CloStream.

What is the input of CloStream?

The input of

CloStreamis a stream of transactions. Eachtransactionis a set of items (symbols). For example, consider the following five transactions (t1, t2, ..., t5) and 5 items (1,2, 3, 4, 5). For example, the first transaction represents the set of items 1, 2 and 4. It is important to note that an item is not allowed to appear twice in the same transaction and that items are assumed to be sorted by lexicographical order in a transaction.CloStreamis an algorithm for processing a stream. This means that CloStream is allowed to read each transaction only once because a stream is assumed to be potentially infinite and coming at high speed.

Transaction ID |
Items |

t1 |
{1,2 4} |

t2 |
{2, 3, 5} |

t3 |
{1, 2, 3, 5} |

t4 |
{2, 5} |

t5 |
{2, 3, 5} |

What is the output of CloStream?

CloStreamproduces as output the set ofclosed itemsets contained in the transactions that it has seen until now. An itemset is an unordered set of distinct items. Thesupport of an itemsetis the number of transactions that contains the itemset. For example, the itemset {1, 2, 4} has a support of 1 because it only appear in t1. Aclosed itemsetis an itemset that is not included in another itemset having the same support. For example, if we applyCloStreamto the five following transactions, the final result is:

closed itemsets |
support |

{} | 5 |

{3} | 4 |

{1, 3} | 3 |

{1, 3, 4} | 1 |

{2, 5} | 4 |

{2, 3, 5} | 3 |

{1, 2, 3, 5} | 2 |

For example, the itemset {2, 3, 5} has a support of 3 because it appears in transactions t2, t4 and t5. It is a closed itemset because it has no proper superset having the same support.

Input and output file format

This is not applicable for this algorithm since it is designed for a stream of data (see the source code example referenced above to understand how to use this algorithm).

Performance

CloStreamis a reasonably efficient algorithm. A limitation of this algorithm is that it is not possible to set a minimum support threshold. Therefore, if the number of closed itemsets is large, this algorithm may use too much memory. However, CloStream has the advantage of being very simple an easy to implement.

Where can I get more information about this algorithm?

The

CloStreamalgorithm is described in this paper:Show-Jane Yen, Yue-Shi Lee, Cheng-Wei Wu, Chin-Lin Lin: An Efficient Algorithm for Maintaining Frequent Closed Itemsets over Data Stream. IEA/AIE 2009: 767-776.

For a good overview of

frequent itemset mining algorithms, you may read this survey paper.

<< Return to table of contents of SPMF documentation

Copyright © 2008-2020 Philippe Fournier-Viger. All rights reserved.