Download
There are two versions of SPMF:
 The source code version includes all the algorithms. It requires prior experience with Java for compiling the source code and running the examples.
 The release version provides a graphical
user interface and a command line interface,
and it is easy to use.
It offer all of the algorithms except a few exceptions.
Source code version (262 algorithms)  Release version (238 algorithms) 
1) Download spmf.zip 2) Read the instructions for installing and running the source code: how_to compile source code and run it 3) After you have installed the source code, if you intend to modify the source code and/or reuse it in other Java projects, you may want to read the developer's guide, which provides information about the source code organization. Algorithms included: all the algorithms 
1) Download spmf.jar and the sample data files test_files.zip 2) If you want to use the graphical interface, follow these instructions: how_to_run_the graphical_interface If you want to use the command line interface, follow these
instructions: Algorithms included: all except CloStream,
estDec, estDec+, ItemsetTree, MemoryEfficient Itemset Tree,
ID3, and a few others 
If you have any questions, you may first have a look at the FAQ, and then ask your question in the data mining forum. If the question has to be private, you can sendme an email.
If you want to use SPMF from programs written in other languages such as Python and R, you may call SPMF from the command line or use some unofficial wrappers for SPMF. A limitations of wrappers is that they may not support all algorithms from SPMF.
Release notes
 v 2.62
 15 new tool(s) to calculate statistics about datasets:
 A new to calculate statistics about a product transaction database
 A new to calculate statistics about a sequence database with cost and binary utility
 A new to calculate statistics about a sequence database with cost and numeric utility
 A new to calculate statistics about a sequence database with utility
 A new to calculate statistics about a timeextended sequence database
 A new to calculate statistics about a transaction database with cost and utility
 A new to calculate statistics about a transaction database with utility and period information
 A new to calculate statistics about a transaction database with utility and timestamps
 A new to calculate statistics about an event sequence
 A new to calculate statistics about an interval sequence database
 A new to calculate statistics about a multidimensional sequence database
 A new to calculate statistics about a multidimensional sequence database with timestamps
 A new to calculate statistics about an uncertain transaction database
 A new to calculate statistics about a file with double vectors (instances) for clustering
 A new to calculate statistics about time series
 Improvements to existing features
 Workflow editor: added a menu for exporting a workflow as a BAT or SH scripts for Windows and Linux, respectively.
 Log console in the SPMF GUI: Modified the contextual menu to add the option of changing the font size and saving the current log to a file.
 Modified the progress bar of the GUI to display the time since an algorithm has been launched.
 Datasets:
 Added sequence of API calls of malware programs to the "Datasets" page of the SPMF website.
 Added four customer shopping datasets: instacart_prior, insta_cart_train, TH1 and TH2. TH2 is the largest with about 26 million transactions. The Instacart datasets are interesting as they also include a taxonomy with categories of items, and item names.
 Bug fix(es)
 Fixed a bug in the tool for calculating stats about a sequence database such that it did not handle file containing lines starting with the character @.
 Fixed a bug in the progress bar of the developer tool for checking broken links from the SPMF documentation.
 Fixed a bug with the choice "Don't open" for opening the output file in the user interface of SPMF. It was displaying an error.
 Fixed a bug in VertTIRP and FastTIRP in the use of the maxGap constraint.
 Fixed a bug in the new ConsolePanel of the GUI of SPMF such that the console output was sometimes not displayed after using the tool to run experiments. And also fixed a bug that when an algorithm was run as an external process, the console output was not shown in the console panel.
 Fixed some display issue for lowresolution screens.
 Fixed the console of SPMF so that it autoscroll to the last line.
 Code improvements:
 Fixed some warnings, some potential resource leaks, and rewrote some code that was using deprecated methods.
 Known issues:
 There is a bug in the UPSpan implementation such that it can miss some patterns (thanks to Acquah Hackman for reporting it). For example, on contextHUE_Span.txt, minutil = 0.45 and maxWindow =2, the pattern 2 1 1 3 1 #UTIL: 10 is missing from the results of UPSpan. But this pattern can be found by HUESPAN and if we check by hand, this pattern does exist in the database. By the way, to compare the results of HUESPAN with UPSpan it is necessary to use maxWindow+1 for HUESPAN and set UseTraditionalUtility = true in the user interface or checkMaximumUtility = false in the source code version.
 These is a bug in the BIDE+ implementation such that some incorrect results may be obtained for sequences with multiple items per itemset.
 There is in a bug in the implementation of CHUIMiner(max) such that results may be incorrect on some dataset. A bug fix will be made soon
 15 new tool(s) to calculate statistics about datasets:
 v 2.60  2024420  major version  18 new algorithms, 21 new viewer tools, 8 bug fixes, several graphical user interface improvements, new tools, etc.
 New algorithm(s)
 the MRICE algorithm (2024) for mining minimal rare itemsets using the crossentropy method (thanks to Song. et al. for providing the source code of the original implementation)
 the FastTIRP (2022) for discovering frequent timeinterval related patterns in sequences of events described using time intervals.
 the VertTIRP (2021) for discovering frequent timeinterval related patterns in sequences of events described using time intervals.
 the KRIMP (2011) algorithm for discovering compressing itemsets in a transaction database.
 the SLIM (2012) algorithm for discovering compressing itemsets in a transaction database.
 the MAPD algorithm (Wu, Y. et al., 2014) for mining frequent sequential patterns with periodic wilcard gaps in a sequence of characters (thanks to Wu et al. for providing a Java conversion of the original code)
 the OWSPMiner algorithm (Wu, Y. et al., 2022) for mining selfadaptive oneoff weakgap strong sequential patterns in a sequence of characters (thanks to Wu et al. for providing a Java conversion of the original code)
 the KMean++ algorithm for discovering clusters.
 the FPGrowth(topk) algorithm, which is a variant of the SPMF version of FPGrowth to find the topk frequent itemsets.
 the Apriori(topk) algorithm, which is a variant of the SPMF version of Apriori to find the topk frequent itemsets.
 the ETAUIM algorithm (2023) for mining the topk high average utility itemset using a breadthfirst search (obtained from Github @liuxuan605 based on the use of code derived from SPMF under the GPL license)
 the ECHUM algorithm (2022) for mining correlated high utility itemsets using the Kulczynski correalation measure (obtained from Github @aman955 under the GPL license)
 the EMSFUID and EMSFUIB algorithms (2022) for mining the skyline frequentutility itemsets (obtained from Github @liuxuan605 based on the use of code derived from SPMF under the GPL license)
 the FUIMTFTree and FUIMTWUTree algorithms (2022) for mining frequentutility itemsets (obtained from Github @liuxuan605 based on the use of code derived from SPMF under the GPL license)
 the PHMN and PHMN+ algorithms (2023) for mining periodic high utility itemsets with positive or negative utility ( obtained from Github @Laughing1999 under the GPL license)
 New GUI tools to view different types of datasets:
 A simple tool to view the content of an ARFF file
 A new tool to view the content of an an event sequence file
 A new tool to view a sequence database cost binary utility file
 A new tool to view a sequence database cost numeric utility file
 A new tool to view a sequence database file
 A new tool to view a timeextended sequence database
 A new tool to view a multidimensional sequence database
 A new tool to view a multidimensional time sequence database
 A new tool to view a sequence utility database file
 A new tool to view a cost utility transaction database file
 A new tool to view a transaction database file
 A new tool to view an uncertain transaction database file
 A new tool to view a utility transaction database file
 A new tool to view a utility time transaction database file
 A new tool to view a utility period transaction database file
 A new tool to view a product transaction database file
 A new tool to view a sequence database file with time intervals
 New tool(s) to generate synthetic datasets:
 A new tool to generate synthetic datasets for clustering
 New tool(s) for data transformation:
 A new tool to fix a sequence database file (such as repeated items in an itemset). The tool is called "Fix_a_sequence_database" in the GUI and command line interace of SPMF.
 A new tool to fix item ids in a transaction database with utility information
 New tool(s) to calculate statistics about datasets:
 New GUI tool(s):
 A new tool called the MemoryViewer to monitor the memory usage of the JVM when using SPMF.
 A new tool called the Workflow Editor to create and edit workflows (a set of algorithms to be executed one after the other).
 A new tool in the source code of SPMF called the "PreferencesViewer" in the package "ca.pfv.spmf.gui;" that allows to visualize the data (your preferences) stored by SPMF in the registry of your computer. It includes a button to reset the preferences to their default value. This tool is not accessible from the main interface of SPMF.
 A new developers' tool window for developers of SPMF
 A new tool to download an offline copy of the SPMF documentation on your computer
 New data structure(s) for developers:
 A collection of data structures optimized for primitive data types in the package ca.pfv.spmf.datastructures.collections. It includes optimized implementations of List, Map, Set, LinkedList and Comparator for primitive types. Using these can reduce the memory usage and provide a speed up when implementing some new algorithms.
 Improvement(s) to existing features
 Cleaned the source code of the user interface of SPMF and improved several aspects for a better design. For instance, the parameters of each algorithm are now entered by the user using a table, and the main window of SPMF is now updated so that components are layed out using relative positions rather than fixed X,Y positions.
 Modified the Pattern Viewer tool of spmf to allow visualizing the distributions of patterns found by an algorithm as a frequency histogram.
 Improved the Graph viewer.
 The cursor is now changing when the mouse is over a node. Besides, the FruchtermanReingold automatic layout for graphs has been improved using a grid optimization (thanks to Zevin Shaul).
 Implemented the Rectangle graph layout
 Modified KMeans to use a random selection of instances as initializations to follow the more standard approach.
 An improved MemoryLogger class, that now has two new functions "startRecordingMode" and "stopRecordingMode" to record values collected by the MemoryLogger to a file. This is useful for evaluating the memory performance of algorithms.
 New datasets:
 Added timeinterval sequence datasets that can be used with algorithms like FastTIRP and VertTIRP
 Bug fix(es)
 Fixed a bug in VMSP for the maxgap constraint (thanks to Alexandre Vernotte for reporting the bug) and a related issue in VGEN.
 Fixed a bug in the Database.java class of the TNR algorithm that cause some issues on some input files (thanks to Huan Yang for finding and fixing the bug).
 Fixed a format problem with the OnlineRetail_II_best dataset, which caused problems for some sequential pattern mining algorithms. The file had some problem and has been fixed. Thanks to Suzuki Shota for reporting the problem.
 Fixed a bug in the NONEPI algorithm such that some patterns may be missed (thanks to Zefen Chen for reporting the bug).
 Fixed a bug in the NOSEP algorithm when using the minlen constraint (thanks to M. Zivanovic for reporting the bug).
 Fixed a bug in the class AlgoInstanceFileReader that could cause some errors in clustering algorithms and the display of clusters by the cluster viewer
 Fixed a bug in the SPMF text editor such that the interface was sometimes not updating properly when the font size or font family was changed by the user.
 Fixed the output of the AlgoSFUPMinerUemax algorithm so that #UTIL: is used instead of #UTILITY to be consistent with the outpout of other algorithms
 v 2.59 20221225
(3 new algorithms + 2 new tools + bug fix)
 New algorithms:
 the PPFP algorithm to discover productive periodic frequent itemsets in a transaction database (a list of transactions) (thanks to Vincent Nofong for providing and integrating the original code)
 the NPFPM algorithm to discover nonredundant periodic frequent itemsets in a transaction database (a list of transactions) (thanks to Vincent Nofong for providing and integrating the original code)
 the SRPFPM algorithm to discover selfreliant periodic frequent patterns in a transaction data base (thanks to Vincent Nofong for providing and integrating the original code)
 New tools in SPMF's GUI:
 A new tool called "Algorithm Explorer" to explore the collections of algorithms offered in SPMF. It is a window with a tree where algorithms are classified by category and where information can be obtained about an algorithm by clicking on it. Moreover, there is a function to search for similar algorithms with the same input/output file types and mandatory parameters.
 A new tool called "Graph Viewer" to view graph files and subgraphs found by algorithms such as gSpan, cgSpan and TKG. This tool is working but some improvements will likely be made in the next versions to add more features such as zooming, etc.
 Improvements:
 Improved the transaction and database classes of the TopKRules and TopKClassRules algorithms to reduce memory usage for very large databases (thanks to Zevin Shaul)
 Bug fix
 Fixed a bug in VGEN such that some patterns could be missed when using the maxgap (thanks to Darrell Conklin for reporting the problem).
 New algorithms:
 v 2.58b 20221130 (6 new algorithms + user interface improvements )
 New algorithms:
 the HUCIMiner algorithm to mine closed high utility itemsets and generators at the same time (thanks to Jayakrushna Sahoo et al. for the original code )
 the FHIM algorithm to mine all high utility itemsets (thanks to Jayakrushna Sahoo et al. for the original code)
 the HGB algorithm to mine non redundant high utility association rules (thanks to Jayakrushna Sahoo et al. for the original code)
 the HGBall algorithm to derive all high utility association rules from the non redundant high utility association rules (thanks to Jayakrushna Sahoo et al. for the original code)
 algorithms for mining sequential patterns with flexible constraints in a timeextended sequence database (eg. MOOC data)
 the SPMFCL algorithm (Thanks to Wei Song et al. for the original code)
 the SPMFCP algorithm (Thanks to Wei Song et al. for the original code)
 An improved multithread implementation of the cGSPAN algorithm for mining the closed subgraphs in a graph database, and also a version to mine subgraphs in a single graph using the MNI support measure (thanks to Zevin Shaul for this great work!)
 New features for the GUI::
 An integrated text editor called "SPMF text editor" is added to SPMF. It can be used as an alternative to the default system text editor to open the output files generated by data mining algorithms. This text editor is lightweight and tailored for this task. It has a night mode, search bar, and other features.
 The GUI of SPMF now opens in the middle of your screen. This is useful if you are frequently connecting to screens with different resolutions.
 In the GUI of SPMF, categories of algorithms are now displayed with colors in the JComboBox of algorithms to make it easier to select an algorithm for the user.
 New dataset(s):
 The mooc.txt dataset, which contains over 80,000 sequences with timestamps from a Chinese elearning platform. Each sequence from that dataset is a list of events with timestamps indicating the enrollment of a student in different online courses. This dataset was transformed to SPMF format by Wei Song et al., and is now available in the datasets page of this website.
 New algorithms:
 v 2.57 20221021 (1 new algorithm)
 New algorithms:
 The HUIMACO algorithm for mining high utility itemsets using antcolony optimization (Thanks to Wei Zong and Jiakai Nan for the original code).
 New algorithms:
 v 2.56 2022105
(10 new algorithms)
 New algorithms:
 The NONEPI algorithm for mining episode rules in an event sequence using the nonoverlapping frequency (original implementation by Farid Nouioua, Oulaid Ouarem, et al).
 The LCIM algorihtm for mining the low cost high utility itemsets from a transaction database with utility and cost information (original implementation by Saqib Nawaz et al.)
 The MaxFEM algorithm for mining the maximal frequent episodes in an event sequence (based on the head support).
 The AFEM algorithm for mining the frequent episodes in an event sequence (based on the head support).
 The AFEMRules algorithms for deriving episode rules from the output of AFEM.
 The IncCHUI algorithm for incrementally discovering the closed high utility itemsets (code obtained from Dam et al., and included based on the GPL license)
 The CLSMiner algorithm for mining closed high utility itemsets (code obtained from Github user "limuhangk" under the GPL license, as it contains GPL code from SPMF)
 The HMiner_Closed algorithm for mining closed high utility itemsets (code obtained from Github user "limuhangk" under the GPL license, as it contains GPL code from SPMF)
 The HUIMSU algorithm for mining high utility itemsets (code obtained from Github under the GPL license, as it contains GPL code from SPMF)
 The THUI algorithm for mining the topk high utility itemsets (thanks to Srikumar Krishnamoorty for providing the original code)
 New dataset(s)
 Added real transactions datasets with both cost and utility values to the datasets page. These datasets can be with the LCIM algorithm.
 New algorithms:
 v 2.55 2022710
 Bug fix(es)
 Modified the definition of perfectly rare itemsets in AprioriInverse so that an itemset is rare if < maxsup instead of <= maxsup. This is a small tweak that makes sense to avoid the case that a rare itemset may be a frequent itemset as well.
 Bug fix(es)
 v 2.54 202265 (1 new algorithm + features for experiments)
 New tool for running performance experiments:
 ExperimenterForParameterChange: A new tool to automatically run experiments where one or more algorithms are run on a dataset and a parameter is varied. This tool is useful for writing research papers as it can output results in a tabseparated format that can be easily imported into spreadsheets like Excel to draw charts, and also can generate PGFPlots that can be used directly in Latex documents. This can save a lot of time for carrying out performance experiments. See the documentation for more details.
 Two new features in the GUI of SPMF:
 Run an algorithm as a separated process. If this option is activated, when the user clicks "Run algorithm" in the GUI, the algorithm will be run in a separated virtual machine instead of a thread in the same virtual machine. Running an algorithm in a separated virtual machine is very useful when doing performance experiments as it allows to make sure that the memory usage is reset every time that an algorithm is run, to get accurate results.
 Time limit of X seconds. If this option is activated, an algorithm will be automatically stopped if it runs for more than X seconds.
 New dataset(s):
 A new dataset called Chicago_Crimes_2001_to_2017, which can be used for high utility itemset mining and frequent itemset mining (Thanks for Zhongjie Zhang for providing the conversion from this UCI dataset).
 A new dataset called YooChoose, which was obtained from RecSys2015 and transformed to SPMF format. It is suitable for frequent itemset mining and high utility itemset mining. But it is a very sparse dataset and does not contain item names, so it may not always be appropriate.
 Bug fix(es):
 A bug was fixed such that the output of the TKO algorithm was empty when using the graphical user interface of SPMF (thanks to Jose Maria Luna for reporting it).
 New tool for running performance experiments:
 New algorithm(s)
 An implementation of the cGSPAN algorithm for mining the closed subgraphs in a graph database (thanks to Zevin Shaul)
 An implementation of the FEACP algorithm for crosslevel high utility itemset mining (thanks to N.T Tung, Bay Vo et al.)
 An implementation of binary logistic regression using gradient descent for binary classification using vectors of continuous values as features (only available in the source code for now, documentation will be added later).
 v 2.52 2022210 (3 new algorithms)
 New algorithm(s)
 The TKUCE algorithm for heuristically mining the topk highutility itemsets with crossentropy (thanks to Wei Song, Lu Liu, Chuanlong Zheng et al., for the original code)
 The TKUCE+ algorithm for heuristically mining the topk highutility itemsets with crossentropy with optimizations (thanks to Wei Song, Lu Liu, Chuanlong Zheng et al., for the original code)
 The TKQ algorithm for mining the topk quantitative high utility itemsets (thanks to Nouioua, M. et al., for the original code)
 Other modifications:
 Some refactoring of the code of FHUQIMiner was done to share some code with TKQ.
 Bug fix:
 Fixed a bug in HUSRM (thanks to Lili Chen et al. for reporting the bug and a fix)
 Datasets
 4 new datasets for sequence pattern mining are added called EShop, MicroblogPCU, OnlineRetail_II_all and OnlineRetail_II_best (thanks to Frederic Flouvat). More information about these datasets are available on the datasets page. These datasets are especially interesting because they contain sequences of itemsets instead of sequence of items.
 New algorithm(s)
 v 2.51 2022125 (2 new algorithms)
 New algorithm(s)
 The SFUCE algorithm for mining skyline frequent high utility itemsets using the crossentropy method (thanks to Wei Song, Chuanlong Zheng et al., for the original code)
 The POERMH algorithm for mining partially ordered episode rules in a sequence of events, using the head support (thanks to Yangming Chen et al. for the original code)
 Bug fixes:
 Fixed a bug in gSpan and TKG such that the result could be incorrect when the skip strategy optimization was activated (Thanks to Zevin Shaul for reporting the problem)
 New algorithm(s)
 v 2.50 20211211 (2 new algorithms, bug fixes)
 New algorithms
 The SFUI_UF algorithm for mining skyline utility itemsets using utility filtering (thanks to Wei Song, Chuanlong Zheng et al., for the original code)
 The HAUIMGMU algorithm for mining high average utility itemsets (thanks to Wei Song, Lu Liu, et al. for the original code)
 Bug fixes
 Fixed a small bug in LHUIMiner and PHUIMiner where > minutil was used instead of >= minutil (thanks to Acquah Hackman for reporting it)
 New algorithms
 v 2.49 2021815 (11 new algorithms)
 New algorithms:
 1 new algorithm (HUIMAF) for mining high utility itemsets using the artificial fish swarm algorithm (thanks to Wei Song, Junya Li et a. for providing the original code)
 9 new association rulebased algorithms have
been added to perform classification using association
rules.
Those algorithms are ACAC, ACCF, ACN, ADT, CBA, CBA2, CMAR, L3 and MAC.
Implementations of these algorithms were obtained from the LAC project of Padillo, F., Luna, J.M., Ventura, S. under the GNU GPL3 license (github.com/kdislab/lac). That code was already based on some code from SPMF such as Apriori, Eclat and Charm. Several improvements were made to the code of LAC before integrating in SPMF. Some of the key improvements are:
(1) added code to save a trained classifier to file using serialization and load it to memory, (2) added a function to save rules to a file as text for readability, (3) refactored the code to improve the design, (4) added several optimizations to avoid creating unecessary objects and reduce the complexity of some operations, (5) removed some external dependencies not useful for SPMF, (6) added code to read the input format of SPMF in addition to the ARFF format, (7) changed the internal representations of items to make it consistent with SPMF (starting from 0 instead of 1), (8) reorganized all rulebased classifiers as subclass of RuleClassifier, which is a subclass of Classifier, (9) added many comments, (10) changed several class and package names to make it consistent with the standards of SPMF, (11) replaced the usage of clone() by copy constructors as using clone() is not recommended and this has allowed further refactoring of the code.
Besides, (11) I have written new code to run experiments to compare classifiers. The code for experiment support not only support holdout but also kfold crossvalidation. The redesigned framework for experiments also create output that is tabseparated so that it can be directly copied to Excel.  1 new implementation of the KNN algorithm that can be compared with rulebase classifiers.
 Refactoring
 I have redesigned the code of ID3 so that it
can be compared with the above rulebased classifiers
 I have redesigned the code of ID3 so that it
can be compared with the above rulebased classifiers
 New algorithms:
 v 2.48 2021628 (2 new algorithms)
 New algorithms:
 the HUIMSPSO algorithm for mining high utility itemsets using Setbased Particle Swarm Optimization (thanks to Wei Song and Junya Li for providing the original code)
 the NEclatClosed algorithm for mining frequent closed itemset (thanks to Nader Aryabarzan)
 New algorithms:
 v 2.47 2021528 (8 new algorithms)
 New algorithms:
 the LPPGrowth, LPPMBreadth and LPPMdepth algorithms for discovering local periodic patterns in a transaction database or sequence of transactions (thanks Peng Yang et al. )
 the HUIMHC algorithm for mining high utility itemsets using hillclimbing (thanks to Saqib Nawaz et al.)
 the HUIMSA algorithm for mining high utility itemsets using simulated annealing (thanks to Saqib Nawaz et al.)
 3 postprocessing algorithms to generate standard episode rules from frequent episodes found by the TKE, EMMA and MINEPI+ algorithms (thanks to Yangming Chen)
 v.2.46 2021410 (1 new
algorithm)
 New algorithms:
 The NOSEP algorithm for mining non overlapping sequence patterns with gap constraints in one or more sequences (strings. Thanks to Youxi Wu et al. for providing a Java conversion of his original C++ implementation.
 New algorithms:
 CLHMiner for mining crosslevel high utility itemsets (thanks to Bay Vo et al. for the efficient implementation)
 FHUQIMiner, stateoftheart algorithm for mining high utility quantitative itemsets (by Mourad Nouioua et al.)
 POERM and POERMALL algorithms for mining partially ordered episode rules in a sequence of events (by Yangming Chen et al. )
 New dataset:
 COVID19 genome sequence dataset in SPMF format (prepared by Saqib Nawaz; see paper ). This dataset can be used for sequential pattern mining and sequential rule mining.
 Datasets for high utility quantitative itemset mining (to be used with FHUQIMiner and VHUQI)
 Bug fix/Improvements:
 Fixed a bug so that the confidence of CMRules was incorrectly displayed in the output file (the confidence was saved instead of the sequential confidence) (thanks to Ludwig Zellner for fixing the bug)
 Improved the implementation of VHUQI and fixed a bug in VHUQI such that the output was wrong in some case. Also the input format was adapted to make it more simple and the same as FHUQIMiner (thanks to Mourad Nouioua)
 New algorithms:
 v.2.44 2021212 (4 new
algorithms, bug fixes, new datasets, tools and features)
 New algorithms
 LTHUIMiner for mining the locallytrending high utility itemsets (by Yanjun Yang et al.)
 MLHUIMiner for discovering the multilevel high utility itemsets using a taxonomy (implemented by Ying Wang et al.)
 AERMiner for mining attribute evolution rules in a dynamic attributed graph (by Ganghuan He et al.)
 TSPIN for mining the topk stable periodic patterns in a transaction database (a sequence of events with or without timestamps) (by Ying Wang, Peng Yang et al.)
 Bug fixes
 Fixed a small bug in DFIN (thanks to Nader Aryabarzan)
 Fixed a bug in the user interface for TKG (thanks to..)
 Fixed a small error in the DB_LHUI.txt example file
 Datasets
 Fruithut : a shopping dataset with utility and taxonomy information
 Liquor: a shopping dataset with utility and taxonomy information
 Fixed some problem in the Ecommerce dataset that some items were appearing twice in the same transaction (which should not have happened)
 Tools
 Added a new tool to fix problems in transactions databases with time and utility information (used it to fix the Ecommerce dataset). This tool is the algorithm called "Fix_a_transaction_database_with_utility_time" in the user interface. It makes sure that no item appears twice in a same record (transaction) and that the total utility of each transaction is correctly calculated.
 Features
 Added a feature with example for saving or loading a trained sequence prediction model from file (for the AKOM, DG, TDAG, LZ78, PPM, CPT, and CPT+ algorithms)
 New algorithms
 v 2.43 2020320 (add features to algorithms)
 New algorithm
 The DFIList algorithm for recovering all frequent itemsets from frequent closed itemsets
 Bug fix
 Remove a double space before #SUP: in the output of TKS (thanks to Tom for reporting this issue)
 New algorithm
 v 2.42c 2020319 (add features to algorithms)
 Added the possibility of specifying a minimum pattern length for the FPGrowth and RPGrowth algorithms.
 Added the possibility of specifying a minimum time duration to the TKE algorithm
 Added the possibility of using the occupancy measure to the CEPN, CEPB and CorCEPB algorithms (thanks to Jiaxuan Li), an extension described in the Dawak 2019 paper.
 v 2.42 202035 (4 new algorithms)
 New algorithm(s):
 the TKE algorithm for mining the topk frequent episodes in a sequence of events (by FournierViger et al.).
 the CEPB algorithm for mining lowcost high utility patterns (also known as costeffective patterns) in a sequence database with cost and binary utility values (thanks to Jiaxuan Li)
 the CorCEPB algorithm for mining lowcost high utility patterns (also known as costeffective patterns) in a sequence database with cost and binary utility values (thanks to Jiaxuan Li)
 the CEPN algorithm for mining lowcost high utility patterns (also known as costeffective patterns) in a sequence database with cost and numeric utility values (thanks to Jiaxuan Li)
 Datasets
 Added some datasets with cost/utility sequences for discovering lowcost high utilty patterns (a.k.a. costefficient patterns).
 New algorithm(s):
 v 2.41 2020226 (6 new algorithms)
 New algorithm(s):
 the QCSP algorithm for mining the topk quantive cohesive sequential patterns in a single sequence or in multiple sequences (thanks to Lens Fereman et al.)
 the MRCPPS algorithm for mining rare correlated periodic patterns common to multiple sequences (thanks to Peng Yang et al.)
 the HUESPAN algorithm for efficiently mining highutility episodes in a sequence of events with utility information (thanks to Peng Yang et al.)
 the EMMA, MINEPI and MINEPI+ algorithms for mining frequent episodes in a sequence of events (thanks to Peng Yang et al.)
 Bug fix and documentation errors:
 Fixed a bug in ERMiner, was using == instead of equals() (thanks to Minh Pham for reporting the problem)
 Fixed various small problems in the online documentation and code of SPMF (thanks for Jiaxing Mai for reporting them)
 Datasets
 Reorganized the dataset page
 New algorithm(s):
 v 2.40 20191023 (9 new algorithms)
 New algorithm(s):
 the HUIMABC for mining high utility itemsets using Artificial Bee Colony Optimization (thanks to Wei Song and Chaoming Huang)
 the TKG algorithm for mining the topk frequent subgraphs in a graph database (thanks to FournierViger, P. and Chao Cheng)
 the gSpan algorithm for mining the frequent subgraphs in a graph database (thanks to Chao Cheng)
 the SPPGrowth algorithm for mining stable periodic itemsets in a transaction database (by Peng Yang)
 the MPFPSBFS algorithm for mining periodic patterns common to multiple sequences (by Zhitian Li).
 the MPFPSDFS algorithm for mining periodic patterns common to multiple sequences (by Zhitian Li).
 the NAFCP algorithm for mining frequent closed itemsets (thanks to Nader Aryabarzan et al.)
 the OPUSMiner algorithm for mining selfsufficient itemsets (thanks to Xiang Li for converting the original C++ code to Java)
 Improvements to algorithm(s):
 Replaced the NegFin code with an improved version (thanks to Nader Aryabarzan et al.)
 Added an alternative and faster version of the MISApriori algorithm, named MISApriori(Srinivas) (thanks to Srinivas Paturu)
 New dataset(s):
 Added a new sequence database called ProofSequences to the dataset page of the SPMF website. It contains sequences of mathematical proof steps. Thanks to Nawas et al. for providing this dataset.
 Bug fix(es):
 Fixed a bug in the CHUIMiner(Max) algorithm (thanks to Bao Vu for the bug fix)
 v 2.38  20190202 (1 new algorithm)
 New algorithm(s):
 the PHM_irregular algorithm for mining irregular high utility itemsets. This algorithm is simply a variation of the PHM algorithm for the special case of finding irregular itemsets, which is equivalent to finding non periodic itemsets.
 New algorithm(s):
 the RPGrowth algorithm for mining rare patterns (thanks to Blake Johns and Ryan Benton for implementing the algorithm)
 New algorithm(s):
 v 2.37  20190127 (4 new algorithm(s))
 New algorithm(s):
 the LHUIMiner algorithm for discovering the local high utility itemsets from a transaction database with timestamps and utility information . Those are itemsets that have a high utility during some non predefined time intervals (thanks to Yimin Zhang et al. ).
 the PHUIMiner algorithm for discovering the peak high utility itemsets from a transaction database with timestamps and utility information .Those are itemsets that have a utility that is much higher than usual (a peak) during some non predefined time intervals (thanks to Yimin Zhang et al.).
 the VHUQI algorithm for discovering quantitative high utility itemsets in a transaction database with utility information (modified and integrated from code under the GPL license from UPMiner)
 the Occur algorithm for finding all occurrences of some sequential pattern(s) in sequences (by postprocessing). This algorithm must be applied on the sequential patterns found by another sequential pattern mining algorithms such as CMSPAM and PrefixSpan.
 Datasets:
 Added transactions datasets with timestamps to the dataset page, used for the LHUIMiner and PHUIMiner papers (prepared by Yimin Zhang)
 Bug fix:
 Fix a bug in GoKrimp such that the algorithm would not work with input files containing empty lines, and a bug related to the user interface when GoKrimp was runned without a label file (thanks to V ctor Rodr guezFern ndez).
 Updated the CPT model so that it can now predict the next element of a sequence containing a single item.
 New algorithm(s):
 v 2.36  20190108 (10 new algorithms)
 New algorithms:
 the CHUIMiner(Max) algorithm for discovering maximal high utility itemsets in a transaction database with utility information
 the NegFIN and dFIN algorithms for frequent itemset mining (by Nader Aryabarzan et al. )
 the HUIFPSO, HUIFGA and HUIFBA for mining high utility itemsets using particle swarm optimization, a genetic algorithm and a bat algorithm, respectively (by Wei Song, Chaomin Huang et al.)
 the PHUSPM and UHUSPM algorithms for discovering high utility probability sequential patterns in uncertain data (by Jerry ChunWei Lin, Ting Li et al.)
 the MEMU algorithm for mining highaverage utility itemsets with multiple minimum average utility thresholds (by Jerry ChunWei Lin, Shifeng Ren et al.)
 the ProSecCo algorithm for progressive sequential pattern mining with convergence guarantees (thanks to Sacha ServanSchreiber). This algorithm was runnerup for the best student paper award at ICDM 2018.
 Improvements:
 Improved the FHSAR implementation (thanks to Hoang Thi Dieu)
 New algorithms:
 v 2.35  20181118 (1 new algorithm)
 New algorithms: the UFH algorithm for mining high utility itemsets (by Siddharth Dawar, Vikram Goyal et al.)
 v 2.34  20181003 (5 new algorithms)
 New algorithms:
 Several algorithm implementations by Siddharth Dawar,
Vikram Goyal et al.:
 FHMDS algorithms for mining the topk high utility itemsets in a data stream
 HMiner for high utility itemset mining
 UPHist for high utility itemset mining
 the DFIGrowth and LevelWise algorithms for recovering all frequent itemsets from frequent closed itemsets (thanks to _______)
 the Skopus algorithm for mining the topk sequential patterns with leverage (obtained under GPL license)
 Several algorithm implementations by Siddharth Dawar,
Vikram Goyal et al.:
 Bug fix
 Fixed a bug in the MinFHM algorithm (thanks to Hung Nguyen for finding and fixing the bug)
 v. 2.33  20180610 (new features)

New features
 I have added the possibility of displaying the sequences IDs for patterns output by the FournierViger08, SeqDim, TopSeqRules, TopSeqClassRules, and TNS algorithms. In the GUI of SPMF, this feature is used by setting the optional "Show sequence ids?" parameter to true.
 I have added the possibility of displaying the transactions IDs for patterns output by the TopKRules, TopKClassRules and TNRalgorithms. In the GUI of SPMF, this feature is used by setting the optional "Show transaction ids?" parameter to true.
 I have added a new algorithm called "Closed_class_association_rules(using_fpclose)" to mine class association rules with a single item in the consequence. I have not updated the documentation for this algorithm yet.
 v. 2.31 / 2.32  20180331 / 20180402 (bug
fix)

New algorithms
 The HUIMGA and HUIMBPSO, HUIMGAtree and HUIMBPSOtree algorithms have been reintroduced in SPMF. They had previously been removed due to bugs in the Java conversion of the original C++ code. The problem was that the code was translated from C++ but the memory management model is different in Java and there was some deep copy problem, principaly. The bugs of the Java implementation have been fixed by Chaomin Huang.
 v. 2.30c  20180310 (new feature)

New algorithms
 Fixed a bug in FPGrowth in the "saveAllCombinationsOfPrefixPath" function such that the support of some itemsets was incorrectly calculated (thanks to Konstantin B ttcher for reporting the bug)
 v. 2.30  20180310 (new feature)

New algorithms
 Added a feature to show the names of items in results when using some sequential pattern mining algorithms with the user interface or command line of SPMF (the documentation will soon be updated to explain this feature in more details).
 v. 2.29  20180216 (1 new algorithm)

New algorithms
 Replaced the FCHM algorithm by a newer implementation called FCHM_bond for mining correlated high utility itemsets using the bond measure, and added the new FCHM_allconfidence algorithm for discovering correlated high utility itemsets using the allconfidence measure. (implemented by Yimin Zhang).
 Removed features :
 I temporarily removed the HUIMGA and HUIMBPSO algorithms from the website because the Java implementations of these algorithms have been reported to have a bug. The original implementation were written in C++. It seems that there was some error in the conversion process from C++ to Java. When the bugs are fixed, these algorithms will be added again to SPMF.
 v. 2.29  20180216 (2 new algorithms)

New algorithms
 Added the original implementations of the FAST and CloFAST algorithm for sequential pattern mining (thanks to Fabio Fumarola, Pasqua Fabiana Lanotte, Michelangelo Ceci, Donato Malerba, Eliana Salvemini, Jiawei Han for contributing the original source code).
 v. 2.28  20180210 (2 new algorithms)

New algorithms
 Added an algorithm named TopKClassRules, which is a variation of TopKRules that allows to discover the topk class association rules, that is the k most frequent association rules that appear in a dataset, where the consequent of rules is an item chosen from a list of allowed items specified by the user.
 Added an algorithm named TopSeqClassRules, which is a variation of TopSeqRules that allows to discover the topk class sequential rules, that is the k most frequent sequential rules that appear in a sequence database, where the consequent of rules is an item chosen from a list of allowed items specified by the user.
 v. 2.27  20180205

New features
 Added an optional maximum pattern length parameter
to several algorithms: HMine, defMe,
AlgoAprioriTID_Bitset, AprioriTID, CORI, Eclat, Eclat_bitset,
dEclat, dEclat_bitset, MSApriori, Pascal, UApriori, VME, LCMFreq,
The documentation for these new parameters has not been updated yet but they can be used in the user interface and source code..
 Added optional maximum antecedent and maximum
consequent parameters for several algorithms: CMDeo,
CMRules, ERMiner, TopSeqRules, TNS, TopKRules, TNR
The documentation for these new parameters has not been updated yet but they can be used in the user interface and source code.
 Added an optional maximum pattern length parameter
to several algorithms: HMine, defMe,
AlgoAprioriTID_Bitset, AprioriTID, CORI, Eclat, Eclat_bitset,
dEclat, dEclat_bitset, MSApriori, Pascal, UApriori, VME, LCMFreq,
 v. 2.26  20180202 4 new algorithm(s))

New algorithm(s)
 Added a tool to calculate statistics about a transaction database with utility information.
 Added the original implementation of the CHUD algorithm for closed high utility itemset mining and the TKU algorithm for topk high utility itemset mining (from UPMiner under the GPL license)
 Added a simple implementation of the TKOBasic algorithm for mining the topk high utility itemset mining. Note that this implementation does not include all the optimizations of TKO described in the journal paper. But this implementation can still be quite fast.
 v. 2.25  20180129 (1 new algorithm)

New algorithm
 Algorithm to calculate the median smoothing of a time series.
 v. 2.24  20180125 (bug fixes)
 Bug fixes
 Fixed some bugs in the TDAG and LZ78 sequence prediction models to improve their performance, and fix other related issues (thanks to Luis Angerstein and Jan Wolter for providing these improvements).
 Remove some unused variable and condition in FPGrowth (thanks to Konstantin B ttcher for reporting the problem)
 Modified TSHOUN so that a clearer error message is shown to the user when the parameter "period count" is incorrectly set (thanks to C. Sivamathi for reporting the problem)
 Bug fixes
 v. 2.23  20180121 (new algorithms)
 New algorithm(s):
 The TUP algorithm to discover the topk high utility episodes in a complex event sequence (thanks to Sonam Rathore, Siddarth Dawar et al for providing their original implementation.)
 The UPSPAN algorithm to mine high utility episodes in a complex event sequence (from UPMiner under the GPL license)
 New algorithm(s):
 v. 2.22  20180107 (bug fix + 1 new algorithm)
 Added some new algorithm(s):
 The EHAUPM algorithm to discover high average utility itemsets (thanks to Jerry ChunWei Lin, Shifeng Ren et al.)
 Bug fixe(s):
 Fixed a bug in the output of HAUIMiner such that the average utility was always rounded to an integer value.
 Added some new algorithm(s):
 v. 2.21  20180101 (bug fixes + 9 new algorithms)
 Added some new algorithm(s) related to time series analysis:
 Algorithm to calculate the min max normalization of a time series.
 Algorithm to calculate the standardization of a time series.
 Algorithm to calculate the first order differencing of a time series
 Algorithm to calculate the second order differencing of a time series
 Algorithm to calculate the exponential smoothing of a time series
 Algorithm to calculate the autocorrelation function of a time series
 Algorithms to calculate the prior average, central average and cumulative average of a time series (previously, only the prior average was available in SPMF and was called "moving average". Now three types of moving average are offered and the prior moving average has been renamed)
 Bug fixe(s):
 Fixed a bug in AlgoArrays.java that could cause incorrect results by the TNR algorithm (thanks to Rashmie Abeysinghe for reporting the bug).
 Bug fixe(s):
 Fixed a bug such that the optional sequence identifiers in the output of some sequential pattern mining algorithms were incorrect. According to the documentation, sequence identifiers should start at 0, while for some algorithms, the sequence identifiers were starting from 1. Now the sequence identifiers start from 0 for all the algorithms. Thanks to Mathieu Gousseff for reporting the bug.
 Fixed a bug in the USpan algorithm such that the SWU upperbound was looser than it should (thanks to Tin Truong Chi for finding and fixing the bug).
 Fixed a bug for the FEAT algorithm such that it was throwing exception when using the optional parameter to show sequence identifiers..
 Added some new algorithm(s) related to time series analysis:
 v. 2.19  20171022 (4 new algorithms)
 New algorihtm(s):
 An algorithm to calculate the regression line of a time series using the least squares method. After applying the algorithm to train a linear regression model, the model.can be used to make some simple predictions.
 The mHUIMiner algorithm for high utility itemset mining (by Peng et al. from GitHub, GPL license)
 The ULBMiner algorithm for high utility itemset mining (by Duong, H, FournierViger et al.)
 New feature(s):
 Added an implementation of FHM called FHM(float) which can take utility values as float values instead of integers.
 Added the possibility of specifying a maximum pattern length to the following algorithms : Apriori, AprioriHT, FPGrowth, FPGrowth_association_rules, FPGrowth_association_rules_with_lift
 For sequence prediction, the Evaluator class was modified so that the SPMF format is used to compare sequence prediction models instead of another format.
 Bug fixe(s):
 Fixed a bug in the new version of the Apriori implementation with length constraint. Thanks to Muhammad Yasir Chaudhry for reporting the bug.
 Fixed a bug in the HUIMBSO, HUIMBSPtree algorithms in terms of supported input file format. Thanks to Majdi Mafarja for reporting the problem.
 Fixed a bug in the output format of PrefixSpan and BIDE+ algorithms (some 1 were missing in some cases). Thanks to Matthieu Gousseff for reporting the bug.
 Improved the documentation of SPMF by dividing the single documentation page into multiple webpages (for achival purpose, the old documentation page for SPMF 2.18 can be found here).
 New algorihtm(s):
 v. 2.18  20170806 (new versions of two algorithms, fix bug,
fix dataset issue)
 Added two new versions of the AprioriRare and AprioriInverse algorithms called "AprioriRare_TID" and "AprioriInverse_TID". These versions are based on AprioriTID instead of the regular Apriori. They thus keep transactions identifiers of patterns in memory to avoid scanning the database repeatedly, and can output the transaction ids to the output file (by setting the parameter "Show transactions IDs? to true in the user interface).
 Fixed and reuploaded the "retail" and "pumsb" datasets. They contained an item with the id "0". But some algorithms such as EFIM assume that item identifiers must be positive (thanks to Srikumar Krishnamoorthy for reporting this problem)
 Added a tool to add a value to all item identifiers in a transaction database. This was used to fix the above dataset problem.
 Fixed a bug in the generation of closed association rule mining using the FPClose algorithm (thanks to Benjamin Andow for reporting the bug)
 v. 2.17  20170703
 Added optional parameters for the PFPM and PHM algorithms to specify the minimum and maximum number of items that patterns should contain.
 Modified the user interface so that algorithms can have up to seven parameters.
 v. 2.16
 Added a new feature to the CPT and CPT+ sequence prediction models. The user can now obtain information about how a prediction was made. By using the method getCountTable, one can obtain all the symbols and their scores calculated by the model. This explains what is the basis for a given prediction.
 v. 2.15 (4 new algorithms)
 Added implementation of four algorithms (by Jerry
ChunWei Lin, Ting Li, et al.)
 the FFIMiner algorithm for mining fuzzy frequent itemsets in a quantitative transaction database (similar to a database with utility values)
 the MMFIMiner algorithm for mining fuzzy frequent itemsets in a quantitative transaction database
 the HAUIMiner algorithm for mining high average utility itemsets in a transaction database with utility values.
 the HAUIMMAU algorithm for mining high average utility itemsets in a transaction database with utility values using multiple minimum averageutility thresholds.
 Added implementation of four algorithms (by Jerry
ChunWei Lin, Ting Li, et al.)
 v. 2.14 (bug fix)
 Fixed a bug the USpan implementation such that some patterns could be missed (thanks to Tai Dinh and Tin Truong Chi for reporting the bug).
 v. 2.13 (bug fix)
 Fixed a bug in Closed association rule mining with FPClose. Some exception was thrown in some rare case (thanks to Tarannum Zaman).
 v. 2.12  20170205
 Added a new optional parameter to several itemset mining algorithms to let the user decide whether transactions identifiers should be shown in the output file, for each pattern found. The algorithms that support this feature are: AprioriTID, AprioriTID_bitset, Apriori_TIDClose, Charm_bitset, Charm_MFI, Eclat, Eclat_bitset, DCI_closed, CORI. In the user interface of SPMF, the new optional parameter is displayed as "Show transactions IDs? (optional)".
 v. 2.11  20170127 (5 new algorithms)
 Five new algorithms for highutility itemset mining have been
added (by Jerry ChunWei Lin, Lu Yang, Philippe
FournierViger)
 SFUPMinerUgmax for mining skyline frequentutility patterns
 HUIMGA and HUIMGAtree algorithms for mining highutility itemsets using genetic algorithms
 HUIMBPSO and HUIMBPSOtree algorithms for mining highutility itemsets using particleswarm optimization
 Five new algorithms for highutility itemset mining have been
added (by Jerry ChunWei Lin, Lu Yang, Philippe
FournierViger)
 v. 2.10  20170117
 The SAX algorithm has now a new optional parameter "deactivatePAA". It allows to deactivate the transformation to the piecewise aggregate approximation (PAA) when applying SAX. This allows to convert a files containing several time series having different lengths to their SAX representations while preserving their original lengths (rather than converting all of them to time series having the same length).
 Fixed a bug in the TopKRules algorithm that was introduced in a previous version of SPMF. The output was correct but the algorithm was not using the set "candidates" in the most efficient way. (Thanks to Bima Haryanto Putra for reporting the bug)
 Fixed a bug in the MaxSP algorithm (thanks to Natalia Mord for proposing the bug fix).
 v. 2.09  20161228
 Added a vizualization tool called the Instance viewer for visualizing the input files of clustering algorithms such as KMeans and DBScan
 Improved the documentation of the clustering algorithms with some more interesting examples and pictures. Moreover, also did some minor improvements to the code of clustering algorithms. In particular, the input file format for clustering algorithms now let the user specify the names of attributes used to describe the instances.
 I have also improved the Cluster Viewer to let the user select which attributes should be visualized when displaying clusters. Thus the Cluster Viewer can now be used to visualize instances having more than 2 attributes.
 Fixed a bug in the user/interface and command line interface of SPMF for the parameter "required items" of the TKS algorithm.
 v. 2.08  20161225 (cluster visualization)
 Added a vizualization tool called the Cluster Viewer for visualizing clusters of 2D points found by clustering algorithms such as KMeans and DBScan
 Moved the TimeSeries viewer to another package and added a few additional features to its user interface..
 v. 2.07  20161221
 Modified the clustering algorithms (KMeans, Bisecting Kmeans,
Hierarchical clustering, DBScan and OPTICS) such that:
 a label (a name) can be assigned to each instance in the input file. The names of instances are now displayed in the output of these algorithms. This provides more meaningful results.
 a separator such as " " can be provided as parameter to these algorithms. The separator indicates which character is used in the input file to separate values. As a result, most clustering algorithms are now compatible with the time series file format and can be applied to time series (when using the ',' separator).
 Fixed a bug when running the OPTICS algorithm in the user interface or command line interface of SPMF.
 Minor improvements to the Time Series Viewer. When the user moves the mouse over a time series, the name of the time series is shown. Also other minor changes.
 Modified the clustering algorithms (KMeans, Bisecting Kmeans,
Hierarchical clustering, DBScan and OPTICS) such that:
 v. 2.06  20161218 (timeseries mining)
 Added support for timeseries data mining
 an implementation of the SAX algorithm is provided for converting time series to sequence(s) of symbols. This is useful to then apply traditional sequential pattern mining algorithms or sequential rule mining algorithms to time series.
 an algorithm to calculate the moving average of a timeseries (this is useful for making a time series appears more "smooth" by removing noise)
 an algorithm to calculate the piecewiseaggregate approximation of a timeseries, which is used to reduce the dimensionality of a timeseries
 an algorithm to split a timeseries into a given number of time series, or by number of data points.
 a vizualization tool called TimeSeriesViewer for visualizing timeseries.
 Fixed an encoding bug for the conversion of chinese texts to sequences such that chinese characters were not appearing.
 Fixed a bug related to the command line interface of SPMF
 Updated the developer's guide on the website with some minor modifications.
 Added support for timeseries data mining
 v. 2.05  20161116
 Fixed a bug in the command line interface (thanks to Andrey Shestakov for reporting the bug)
 v.2.04 20161014
 Improved the graphical user interface and command line interface of SPMF so that more informative messages are shown to the user when an algorithm parameter is missing or when the value is of an incorrect type. This will make the user interface more userfriendly (thanks to Slimane Oulad Naoui for this suggestion).
 v.2.03 20161013
 Fixed a bug in the VMSP algorithm (AlgoVMSP.java) such that some patterns were missing in some cases when the maxgap constraint was used (thanks to Antoine Pigeau for reporting the problem)
 v.2.02 20161012
 Added support for mining TEXT files with Chinese text (by supporting the Chinese punctuation).
 Fixed a bug in the FOSHU and TSHOUN algorithms, an updated the documentation and sample input file for these algorithms (thanks to Yimin Zhang for reporting the problem)
 v.2.01  20160916 (several improvements)
 Added the support for TEXT files. Using the graphical interface or command line, it is now possible to apply most sequential pattern mining and sequential rule mining algorithms directly to a text file. There is two ways of applying an algorithm on a text file. The first way is to apply the algorithm "Convert_TEXT_file_to_sequence_database" to transform a text file into a sequence database. Then this file can be used with most algorithms for sequential pattern or rule mining using the user interface or command line. The second way is to rename the text file with the extension ".text". Then when using the graphical interface or command line, SPMF will automatically convert the file to the SPMF format, run the selected algorithm, and then show the results in terms of words in the text file rather than integers. This is a feature that has been requested by several users. It is useful for performing data mining on text files without having to write code for converting text to sequences as it was previously required. For now, SPMF only supports the default text file encoding supported by Java. In the future, some options will be added to let the user choose other encodings as well. There is a new example in the documentation that provides also some explanations about how to use text files when running algorithms using the source code version of SPMF. Moreover, a tutorial on the blog explains some of the possibilities for analyzing text documents using this new version of SPMF.
 A new system has been designed for adding new algorithms to SPMF. To add an algorithm, an instance of the class DescriptionOfAlgorithm must be created for the new algorithm in the package "ca.pfv.spmf.algorithmmanager.descriptions". It allows to indicate the type of input, output, the parameters, etc. of the algorithm. This is then used to automatically generate the list of algorithms in the user interface of SPMF, unlike in previous versions of SPMF where this list was hardcoded. In the future, the descriptions of algorithms could be used to build a more complex user interface where user could visually combine various algorithms as a workflow. Another interesting possibility is to provide a user interface to run multiple algorithms one after the other, or to launch experiments where the parameters are varied automatically. This will be considered for features in future releases of SPMF. Moreover, another idea is to use the algorithm descriptions for adding a plugin system in SPMF for importing algorithms from other jar files. In the next few days, I will also update the developper's guide to add more documentation.
 Added the lift measure to the CMDeo algorithm (thanks to Ryan Panos).
 Updated the code of the GCD algorithm for association rule mining with an improved version (thanks to Ahmed ElSerafy, Hazem ElRaffiee).
 Fixed some minor errors in the documentation.
 Fixed a bug in the CMDeo algorithm that could trigger an ArrayOutOfBoundException .
 Fixed a bug in the Pascal implementation (the support of single items was incorrect in some cases).
 Fixed a bug in the user interface for the FPClose algorithm.
 Fixed a bug in the VMSP and VGEN algorithms that occurred when maxLength was set to 1.
 v.0.99j  20160616
 Fixed a bug in the VMSP implementation (thanks to Himel Dev for reporting the bug)
 Two additional large itemset mining datasets have been added to the datasets page of the website: PowerC and Susy (thanks to Zhang Zhongjie)
 v.0.99i  20160609 (1 new algorithm)
 Added an implementation of the GCD algorithm for mining association rules (thanks to Ahmed ElSerafy, Hazem ElRaffiee for providing the implementation)
 v.0.99h  20160609
 Added a tool to resize databases in SPMF format using a percentage of lines from an original database (useful for performing scalability experiments)
 Fixed a bug in the FHSAR implementation (thanks to Gehad Ahmed Soltan AbdElaleem for reporting the bug)
 v.0.99g  20160602 (4 new algorithms)
 Added an implementation of the MinFHM algorithm for mining minimal highutility itemsets.
 Added an implementation of the PFPM algorithm for mining frequent periodic patterns in a transaction database (a sequence of transactions)
 Added an implementation of the PHM algorithm for mining periodic patterns that have a high utility (e.g. yield a high profit) in a sequence of transactions (a transaction database)
 Added an implementation of the SkyMine algorithm for mining skyline highutility itemsets (thanks to V. Goyal. et al.)
 Added a tool to remove utility information from transactions databases containing utility information.
 v.0.99f  20160530
 Fixed a bug that may generate incorrect support count in VMSP and other SPAM based algorithms in some specific cases. The bug was introduced in a previous version when adding the maxgap constraint to SPAM based algorithms (thanks to Preethy Varma for reporting the bug)
 Seven large datasets for itemset mining have been added to the datasets page of the website: kddcup99, Skin, Pamp, USCensus, OnlineRetail, and RecordLink (thanks to Zhang Zhongjie)
 v0.99e  20160329 (1 new algorithm)
 Added the original implementation of the EFIMClosed algorithm for mining closed highutility itemsets.
 v0.99d  20160323 (1 new algorithm)
 Added the original implementation of the FHM+ algorithm for efficiently mining highutility itemsets with length constraints.
 v0.99c  20160313
 Fixed bugs in the new BIDE+ and PrefixSpan implementation that occurred for sequences containing multiple items per itemset.
 v0.99b  20160228
 I have further optimized the new Prefixspan implementation, in the package ca.pfv.spmf.algorithms.sequentialpatterns.prefixspan.
 I have replaced the old implementation of BIDE+ with a new implementation. The new implementation is in the package ca.pfv.spmf.algorithms.sequentialpatterns.prefixspan. This new implementation is faster and more memory efficient (up to 10 times faster on some dataset, and uses less memory). I have tested this implementation quite well. But if you find some issues, please let me know. Note that some algorithms may still rely on the old implementation (e.g. the Fournier08 algorithm). I will further clean the code in upcoming versions of SPMF to avoid keeping two versions of BIDE+.
 Fixed a bug in the FOSHU and TSHOUN algorithms. The absolute value of to(X) is now used to calculate the relative utility of an itemset X.
 v0.99  20160221
 In this new version, I have replaced the Prefixspan implementation with a new implementation, in the package ca.pfv.spmf.algorithms.sequentialpatterns.prefixspan. This is something that I have wanted to do for a while since the previous version had been implemented a long time ago. The new version is based on different design decisions and includes some additional optimizations. It can thus be more than 10 times faster than the previous implementation on some dataset and use three times less memory. This also makes the RuleGen algorithm faster since it relies on PrefixSpan. Note that some algorithms may still rely on the old implementation.
 v0.98e  20160205
 Added the possibility to mine closed association rules using FPClose. The version using FPClose can be 10 times faster than the version using Charm for the step of rule generation because FPClose stores closed itemsets in a CFItree.
 v0.98d  20160202
 Fixed a bug in DBScan, Optics, and the KDTree implementation.
 v0.98c  20160129 (1 new algorithm):
 Added an implementation of the USPAN algorithm for mining highutility sequential patterns.
 v0.98b  20160128 (1 new algorithm):
 Added an implementation of the FCHM algorithm for mining correlated highutility itemsets using the bond measure.
 Modified the output of TKS to remove the 2 at the end of each pattern found, so that the output is similar to other sequential pattern mining algorithms.
 v0.98  20160114 (added a new window for
result visualization)
 This new version offers a new user user interface for vizualizing results. It is a window specifically designed for visualizing patterns found by pattern mining algorithms but it works with clustering algorithms and most algorithms. This window can be accessed when using the graphical interface of SPMF by selecting the checkbox "using SPMF viewer". The new window show the patterns found by an algorithm in a table, and it let the user apply some filters to select patterns or to sort the patterns by ascending or descending orders using various measures such as support and confidence (depending on the algorithms) by clicking on the column headers. This window for visualizing patterns should work with most algorithms offered in SPMF. If you find some bugs related to this new window for visualizing results, or if you have ideas to improve the user interface of SPMF, you may let me know.
 Besides, I fixed a few bugs.
 v0.97d / 0.97e  20151206 (2 new
algorithm)
 Added an implementation of the GHUIMiner algorithm for mining generators of highutility itemsets in a transaction database having utility information.
 Added an implementation of the CHUIMiner algorithm for mining closed highutility itemsets in a transaction database having utility information.
 Fixed a bug in MaxSP. No result where generated for minsup = 0 sequence. Now, if the user set minsup = 0 sequence, MaxSP change minsup to 1 sequence (because it does not make sense to generate patterns that do not exist in the database).
 Fixed a bug in FPClose (thanks to Jamshi Nazeer for reporting the bug)
 Fixed a bug in the GoKrimp algorithm when reading a file without optional labels (thanks to Jaroslav Fowkes and Thomas Christie for reporting the bug)
 Fixed a bug in the ClaSP / CMClasp algorithms when handling
databases with multiple items per itemsets (thanks to Tin Truong Chi for the bug
fix)
 v0.97c  20151028 (1 new algorithm)
 Added a variation of the FHM algorithm named FHMFreq for mining frequent highutility itemsets. This is a quite simple modification of FHM to add the minsup threshold.
 v0.97b  20151006 (1 new algorithm)
 Added an implementation of a Naive Bayes Document Classifier (implemented by Sabarish Raghu)
 Fixed a bug in the text clusterer (bug reported and fixed by Dharmen Punjani)
 Fixed a bug in FPClose (thanks to Insil Yun)
 v0.97a  20150919
 Fixed a bug in the graphical interface for the SPAM algorithm (thanks to Martin B ckle for reporting the bug).
 Added the minimum pattern length constraint for the SPAM algorithm.
 v0.97  20150912 (major revision  16 new
algorithms)
 Added several sequence prediction algorithms to SPMF.
The algorithms are CPT+, CPT, PPM, DG, AKOM, TDAG and LZ78. Those algorithms are designed to predict the next symbol of a given sequence based on a set of training sequences. The algorithms are implemented by Ted Gueniche, as part of its Ipredict project.  Added an implementations of FOSHU and TSHOUN for onshelf highutility itemset mining.
 Added an implementations of EIHI and HUILISTINS for incremental highutility itemset mining.
 Added an implementation of HUSRM for highutility sequential rule mining.
 Added an implementation of EFIM, d2HUP and HUPMiner algorithms for highutility itemset mining.
 Added an implementation of HUGMiner for mining highutility generator patterns
 Added more performance comparisons on the "Performance" page of the website
 Added datasets for onshelf utility mining and high utility sequential rule mining on the "Datasets" page of the website.
 Fixed a bug in the "maxgap" constraint implementation for the TKS, CMSPAM algorithms and other SPAM based algorithms, that sometimes occured when an item appeared multiple times in the same sequence.
 Updated the map of data mining algorithms.
 This version requires to have Java 1.8, installed on your machine. It may be necessary to update the Java SDK on your machine and perhaps also your development environment such as Eclipse.
 Added several sequence prediction algorithms to SPMF.
 v0.96r20  20150825
 Added an optional parameter to SPAM, VMSP, VGEN and TKS to show the identifiers of sequences containing each pattern found. If this parameter is set to true, the identifiers of sequences will be shown in the output by these algorithms.
 v0.96r19  20150818
 Fixed a bug in the FPGrowth algorithm that was introduced in v96r14 when some optimizations where made to the FPGrowth code (thanks to Masanori Akiyoshi for finding the bug). The support of itemsets was in some cases incorrectly calculated.
 Fixed an integer overflow problem occuring only for very large datasets for FHM, FHN and HUIMiner.
 Fixed a bug in the CORI algorithm (thanks to PierreEmmanuel Leroy)
 v0.96r17/r18  20150526
 Further optimization of memory usage for the Eclat, dEclat, Cori and DefMe algorithms.
 Fixed a bug in the correlation distance function for clustering.
 Fixed a bug that occurred when using the "maxgap" constraint in the VMSP, VGEN, CMSPAM, SPAM and TKS algorithms (thanks to Choong Shin Siang and Wong Li Pei for reporting the bug).
 Optimized the HMine algorithm implementation.
 Fixed a bug in FHN.
 Fixed a bug in the UPGrowth+ implementation (thanks to Prashant Barhate for contributing this implementation)
 v0.96r16  20150428 (1 new algorithm)
 Added an implementation of the CORI algorithm for mining rare correlated itemsets from a transaction database.
 Added an implementation of the FPClose algorithm for mining closed frequent itemsets from a transaction database.
 Added an implementation of the PrePost+ algorithm (a variation of PrePost) for frequent itemset mining
 Added an implementation of the FHN and HUINIVMine algorithms for highutility itemset mining with negative or positive unit profit values.
 v0.96r14  20150405 (1 new algorithm)
 Added an implementation of the FPMax algorithm for mining maximal frequent itemsets from a transaction database.
 Fixed a bug in the dCharm_bitset implementation. The result was sometimes incorrect.
 The runAlgorithm() method of CommandProcessor is now public as requested by some user.
 v0.96r13  20150322 (1 new algorithm)
 Added an implementation of the Optics algorithm. Optics generates a clusterordering from a set of double vectors. From this ordering, various things can be done. I have implemented the method extractDBScan() method to use this ordering to generate DBScan style clusters. I have however not implemented the alternative extractCluster() method, described in the paper.
 v0.96r12  20150321
 Fixed a bug in the hash function of CloSpan, ClaSP and CMClaSP, that provoked a StackOverflow exception for these algorithms in some rare cases (thanks to Wen Zhang for reporting the bug and Antonio Gomariz for fixing it).
 v0.96r11  20150316
 Fixed a bug in the Zart algorithm (reported by Asmaa) that was generating an ArrayOutOfBound exception when no single items were frequent. Furthermore, I have modified the outpout of Zart to make it clearer and updated the documentation.
 v0.96r10  20150312
 Modified the graphical user interface of SPMF (files in the package ca.pfv.spmf.gui) so that when the user is launching an algorithm, it is now done in a separated thread and a button "Stop algorithm" is available to stop the algorithm execution if it is taking too much time.
 v0.96r9  20150307 (1 new algorithm)
 Added an implementation of the DBScan algorithm for densitybased clustering.
 Added the feature of searching all points within a radius to the KDTree implementation.
 v0.96r8  20150305

New feature for most sequential pattern mining algorithms: the user
can now request to show the corresponding sequence ids for each
pattern found. In other words, for each pattern found, SPMF can now
show the ids of the sequences where the pattern appears. This feature
was added in BIDE+, ClaSP, CMClaSP, CloSpan, CMSPADE, CMSPAM,
SPADE, SPAMAGP, GSP, PrefixSpan, TSP, MaxSP, FEAT and FSGP. The
documentation will be updated soon... Moreover, I fixed some minor
issues in the code of FEAT and FSGP (the code for saving to file was
not working as expected for these algorithms).
 v0.96r7  20150217 (1 new algorithm)
 Added an implementation of the Bisecting KMeans clustering algorithm.
 Added more features to the KMeans and Hierarchical Clustering algorithms. Previously, the euclidian distance was the only distance function available. Now, the user can choose between Euclidian distance, correlation distance, cosine distance, Manathan distance and Jaccard distance.
 Update: 20150219: Fixed a bug in the command line interface of SPMF (bug reported by Wen Zhang).
 v0.96r6  20150216
 Fixed bugs in the dEclat, dCharm algorithms and FIN/PrePost implementations.
 v0.96r5  20150213
 Added the "max gap" parameter for the VGEN, VMSP, TKS, SPAM, SPAM and CMSPAM sequential pattern mining algorithms. It is an optional parameter that allows to specify if gaps are allowed in sequential patterns. For example, if "max gap" is set to 1, no gap is allowed (i.e. each consecutive itemset of a pattern must appear consecutively in a sequence). If "max gap" is set to N, a gap of N1 itemsets is allowed between two consecutive itemsets of a pattern. If the parameter is not used, by default "max gap" is set to +∞.
 Fixed a bug in the ItemsetTree and Memory Efficient ItemsetTree implementations (thanks to Ryan G. Benton for reporting and fixing the bug). The support of itemsets was sometimes calculated incorrectly.
 v0.96r3/r4  20150205 (3 new algorithms)
 Added an implementation of a Text Clusterer using the tf*idf measure (thanks to Sabarish Raghu)
 Added an implementation of the EstDec+ algorithm, for mining recent frequent itemsets from data streams (thanks to Azadeh Soltani).
 Memory optimizations of the EstDec algorithm. The algorithm can now use up to 4 times less memory (thanks to Azadeh Soltani).
 Memory optimization of the FPGrowth algorithm to reduce the number of object creation. Reduces memory usage by up to 2 times on some datasets. Also optimized how FPGrowth enumerate all itemsets when an FPTree contains a single branch.
 Refactoring of classes in the package ca.pfv.spmf.gui to separate the main class file and command line interface (Main) from the graphical interface (MainWindow) and from the code for launching algorithms (CommandProcessor) used by both the command line interface and GUI. From now on, the class "Main" will be the main class of the SPMF library.
 Refactoring of the CMClaSP and ClaSP code so that the debuging code for showing the trie is now located in a separated class (ShowTrie). This is to avoid HeadlessExceptions when running ClaSP/CMClaSP in a headless environment (bug reported by Wen Zhang), that is an environment where graphical interface is available (e.g. on some Linux servers).
 Fixed a bug in BIDE+ that was causing ArrayOutOfBound exception on some datasets (thanks to Mehran Memon for reporting this bug).
 Other minor modifications to remove some warnings.
 Fixed a bug in dEclat implementations. Due to an error in how methods were overloaded, some code of Eclat was executed in dEclat.
 Fixed a bug. MaxSP was not working when executed from the GUI.
 Fixed a bug about how an ID3 decision tree was printed to console (thanks to G. Gutierrez for reporting this bug)
 Optimization: replaced StringBuffer by StringBuilder in all classes (since it is more efficient).
 Added the BMS2 dataset to the "dataset" page of the website.
 Added more features to the CMSPAM algorithm
 allow the user to specify items that need to appear in patterns found.
 allow the user to specify the minimum/maximum length of patterns to be found.
 Added more features to the RuleGrowth/TRulegrowth algorithms. User can now specify the maximum size of rule antecedents/consequents to be found.
 v0.96r2  20141130
 Added more features to the TKS algorithm for topk sequential
pattern mining
 modified TKS to allows the user to specify items that need to appear in patterns found.
 modified TKS to allows the user to specify the minimum length of patterns to be found.
 Added the "fix transaction database" tool. It is a tool that fix some common problems that may be found in transaction database files created by users. This tool (1) removes duplicate items in transactions of a transaction database and (2) sort items in transactions (those requirements are assumed by most itemset mining algorithms).
 Added more features to the TKS algorithm for topk sequential
pattern mining
 v0.96r  20141124 (4 new algorithms)
 Added the FIN and PREPOST algorithms (thanks to Zhihong Deng for providing the original C++ source code, that I have converted to Java). These two algorithms are very recent frequent itemset mining algorithm and are very fast. According to some preliminary experiments, for example, PrePost can be a few times faster than FPGrowth on some datasets.
 Added implementations of the LCMFreq, LCM and LCMMax algorithms for respectively mining frequent itemsets, closed itemsets and maximal itemsets (thanks to Alan Souza for providing an implementation of LCM). I have modified the source code to add a few optimizations. The implementation of LCM is based on the paper describing LCM v.2 by Uno, although it does not perform transaction merging yet (some more optimizations could be added in the future). LCM is an interesting algorithm because it was the winner of the FIMI 2004 competition.
 Update (20141124): There was a bug in my modifications of LCM to implement LCMMax, so LCMMax is temporarilly removed from the source code until I can fix the issue. It may takes a few days or more before I can fix it.
 v0.96q  20140925
 Fixed a bug in the compare() method of the Rule class used by TNS and TopSeqRules (thanks to C. Albert Thompson for reporting the bug).
 New tools:
 Added a tool to add consecutive timestamps to a sequence database (this is useful for generating datasets with timestamps for testing algorithms that require timestamps).
 Added a tool for converting a transaction database to a sequence database (this can be useful for generating datasets for experiments, though in reallife, it may not make sense to convert transactions without ordering to sequences with an ordering).
 Added a tool to add synthetic utility values to a transaction database (this is useful for generating datasets to be used in high utility itemset mining).
 v0.96p 20140914 (3 new algorithms)
 Added an implementation of the ERMINER algorithm for sequential rule mining.
 Added an implementation of the IHUP and UPGROWTH algorithms for highutility itemset mining (thanks to Prashant Barhate for implementing these algorithms).
 v0.96o 20140912
 Fixed a bug in Eclat so that frequent itemsets found where not correctly separated by their size and another bug in Eclat such that Eclat was not pruning some itemsets containing 2 items when the triangular matrix was deactivated. (thanks to Abdalghani Abujabal).
 Fixed a bug that occured when running the "Fournier08Closed+time" algorithm using the GUI, (thanks to Nahumi)
 v0.96n 20140815
 Added a tool to convert a sequence database to a transaction database. This tool is useful for example to apply an algorithm designed for a transaction database to a sequence database (e.g. mining association rules in a sequence database).
 Added a tool to generate statistics about a transaction database.
 Fixed a bug in the association rule generation using CFPgrowth that was introduced in a previous version.
 v0.96j 20140623
 Fixed a bug in the LAPIN implementation because of overflow and cleaned the code, and added some comments.
 v0.96k 20140622 (1 new algorithm)
 Added the LAPIN (aka LAPINSPAM) algorithm for sequential pattern mining. This is a first implementation in SPMF based on the LAPINLCI variation of LAPIN described in the technical report of LAPIN.
 Added the dECLAT and dCHARM algorithm for mining frequent itemsets. Those algorithms are respectively variations of the Eclat and Charm algorithms. The difference is that it uses the diffsets data structure rather than tidsets. We provide an implementation of dEclat using sets of integers to represent diffsets ("dEclat") and one version using bitsets ("dEclat_bitset"). For dCharm, only a version using bitsets is provided ("dCharm_bitset").
 v0.96h  20140615 (1 new algorithm)
 Added the DEFME algorithm for mining frequent generator itemsets using a depthfirst search.
 v0.96g  20140614
 Fixed bugs in FEAT/ FSGP that occurred when multiple items per itemsets appeared in input sequences.
 v0.96f  20140611
 Optimized the code for association rule generation. Up to 10 times faster on some datasets.
 Improved the source code for closed association rule mining and merge some classes.
 Introduced a class ca.pfv.spmf.algorithms.ArraysAlgo to put all important algorithms on sorted list of integers that are shared by several algorithms (to remove some redundancy in the source code).
 v0.96e  20140610
 Optimization of the binary search in Apriori based algorithms (Apriori, AprioriClose, AprioriInverse...), as well as in FHM and HUIMiner.
 Major optimizations of the Eclat and Charm algorithms. I have reimplemented most of the code.
 Converted the encoding of Java source code files from ISO88591 to UTF8 to remove warnings when compiling the code in Net Beans (Thanks to M. Witbrock for reporting this issue)
 Added a performance comparison with a closed source data mining library in the "performance" section of the website.
 three new algorithms:
 FEAT, FSGP and VGEN for mining frequent sequential generator patterns from a sequence database.
 fixed an array out of bound exception in the FPGrowth algorithm that occurred when all items are infrequent (thanks to Aman).
 v0.96c  20140430
 fixed a bug in association rule generation with CFPGrowth (AlgoCFPGrowth.java), thanks to Manperta Negara Situmorang.
 v0.96b  20140424
 Major optimization of all FPGrowth based algorithms (FPGrowth, FPGrowth_with_strings, CFPGrowth++), thanks to Dan Cappucio. The modification is to add mapItemLastNodes in the FPTree / MISTree classes (see the "performance" section of the website for an overview of the speed improvement).
 fixed a bug in the CMDeo algorithm that was causing an array out of bound exception
 v0.96  20140406 (3 new algorithms)
 new algorithms:
 added the GoKrimp and SeqKrimp algorithms (by Thanh Lam Hoang et al) to discover compressing sequential patterns directly of by postprocessing.
 added the FHM algorithm for mining highutility itemsets
 new algorithms:
 v0.95d  20140310
 CFPGrowth has been renamed CFPGrowth++ since it includes the optimizations proposed in CFPGrowth++.
 fixed a bug in the VMSP algorithm (no result was shown)
 v0.95b  20140306
 fixed a bug in the TSP implementation
 fix some inconsistencies in the source code of some sequential pattern mining algorithms (thanks to C. Zhou)
 v0.95  20140228 (major revision  10
new algorithms)
 new algorithms for pattern mining:
 TKS for topk sequential pattern mining
 TSP for topk sequential pattern mining
 VMSP for maximal sequential pattern mining
 MaxSP for maximal sequential pattern mining
 ESTDEC for mining recent frequent itemsets from a stream (by Azadeh Soltani)
 MEIT (Memory Efficient ItemsetTree), a data structure for targeted association rule mining
 CMSPAM for sequential pattern mining
 CMSPADE for sequential pattern mining
 CMClaSP for closed sequential pattern mining
 PASCAL for mining frequent itemsets and identifying generators
 added a few minor optimizations to Charm and Eclat
 added the possibility to generate association rules from the output of the CFPGrowth algorithm.
 refactoring of SPADE, Clasp, SPAM_AGP, GSP and PrefixSpan_AGP
 Updated the map of data mining algorithms.
 improved the documentation web page of the website to add the description of the file formats of each algorithm.
 fixed a bug in the ID3 algorithm implementation
 fixed a bug to use the hierarchical clustering algorithm in the GUI (thanks to A. Rai)
 new algorithms for pattern mining:
 v0.94d  20140125
 fixed a bug in the ARFF ResultConverter.java file.
 old jar file for this version
 old source code for this version
 old documentation for this version
 v0.94c  20131126
 fixed rounding inconsistencies among sequential pattern mining algorithms (thanks to A. Pramudita).
 v0.94b  20131007
 Optimized the BIDE+ algorithm.
 fixed a bug in the trimBeginingAndEnd method of PseudoSequenceBIDE.java for the BIDE+ algorithm.
 Cleaned the code of FPGrowth for the case of a tree with a single path (thanks to R. Loomba).
 v0.94  20130812 (major revision  6 new algorithms)
 Several new sequential pattern mining algorithms by
Antonio Gomariz Pe alver:
 GSP,
 SPADE (regular and parallelized versions)
 ClaSP, a very efficient vertical algorithm for closed sequential pattern mining
 CloSpan,
 PrefixSpan and SPAM (alternative implementations)
 Closed sequential pattern mining by postprocessing with PrefixSpan and SPAM.
 Also, new test files: contextCloSpan.txt, contextClaSP.txt and contextSPADE.txt
 Bug fixes
 fixed an important in the class AbstractOrderedItemset of SPMF 0.93, which have affected the result of several algorithms including the algorithm for mining MNR rules.(thanks to Faizal Feroz):
 fixed a bug in the class ItemsetTree (thanks to Faizal Feroz):
 fixed a bug of integer overflow for large datasets (e.g. accidents) that occurred in the hashcode function of Charm (class HashTable) and other algorithms using the same class (thanks to K. Srinvas Rao)
 fixed a bug in Charm (bug also introduced in 0.93 due to refactoring).
 fixed a bug in TNS/TopSeqRules (thanks to Peter Toth)
 Updated the map of data mining algorithms and documentation.
 fixed a bug in dataset generation that was introduced in version 0.93d.
 old documentation for this version.
 old source code for this version.
 old jar file for this version.
 added a new datasets page on the website.
 added support for the ARFF file format (a popular file format that represent a relational database table as a text file). The ARFF format can be used as input in the command line interface and graphical interface of SPMF by algorithms that take a transaction database as input (most itemset mining and association rule mining algorithms). This version support all features of ARFF except that (1) the character "=" is forbidden and (2) escape characters are not considered. Note that when the ARFF format is used, the performance will be less than if the native SPMF file format is used because a conversion has to be performed. However, this additional cost should be small. Note that SPMF also support a few other formats besides ARFF (see the last examples in the documentation on file conversion for more information). However, only the ARFF format is converted onthefly (other formats have to be converted manually before applying an algorithm). 36 datasets in the ARFF format can be found in the datasets page of this website.
 added a tool to convert the CSV format with positive integers to a transaction database in SPMF format.
 improved the documentation
 fixed a bug in the hierarchical clustering algorithm
 fixed a bug in the sequence database generator and transaction database generator.
 added the HirateYamana algorithm to the GUI interface and command line interface
 Several new sequential pattern mining algorithms by
Antonio Gomariz Pe alver:
 v 0.93  20130509 (major
revision  7 new algorithms)
 Several packages and files have been renamed for a better organization of the source code.
 Merged some files that were duplicated in various packages for less redundancy.
 Optimized several Aprioribased algorithms such as AprioriRare and AprioriInverse.
 Some algorithms have been optimized to use better data structures such as using arrays instead of list of integers.
 Added 7 algorithms to the GUI: Apriori_TIDClose, AprioriTID, Apriori  Association rules, Sporadic association rules, Indirect association rules, IGB, MNR.
 Fixed some small bugs in the source code.
 Tried to standardize as much as possible the output files written by the algorithms so that algorithms performing the same task will output the same file format.
 The CHARMMFI algorithms was not generating the correct result. I have found that the problem is the algorithm itself, which is incorrect as it is described in Szathmary (2006) for some special cases. I have adapted the algorithm so that it generates the correct result.
 I have removed four less popular algorithms that were not welldocumented and not offered in the release version of SPMF: the algorithm for mining pseudoclosed itemsets, the GuiguesDuquenne basis, proper basis and the structural basis of association rules. If you want these algorithms, you can download version 0.92 of SPMF to get them.
 Added the maximum pattern length constraint to PrefixSpan and SPAM algorithms.
 v 092c  20130408
 fixed a bug that occurred when prefixspan_with_strings was called from the user interface or command line.
 old documentation for this version.
 old developer's guide for this version.
 old source code for this version.
 old jar file for this version.
 v 092b  20130314
 fixed a bug in the calculation of the lift measure for association rules.
 v 092  20130305 (1 new algorithm)
 added an implementation of HUIMiner, one of the best algorithms for high utility itemset mining.
 v 091  20121229
 added an implementation of Apriori that uses a hashtree to store candidates to calculate the support and generate candidates more efficiently (it is named "Apriori_with_hash_tree" or "AprioriHT"). It can be up to twice faster than the previous version (a performance comparison).
 added a version of FPGrowth that accepts strings instead of integers as input (FPGrowth_itemsets_with_strings)
 v 090  20121225:
 added a tool to generate transaction databases.
 added a tool to generate sequence databases.
 added a tool to convert sequence databases to the SPMF format.
 added a command line interface to run algorithms from the command
line.
 added an implementation of TNR for topk nonredundant association rule mining.
 added an implementation of TNS for topk nonredundant sequential rule mining.
 clean the source code a little bit.
 fixed some small bugs in the Indirect, FHSAR and ZART algorithms.
 added an implementation of CFPGROWTH for mining itemsets with multiple support thresholds (implemented by Azadeh Soltani).
 v 0.88  20120822:
 added an implementation of MSAPRIORI for mining itemsets with multiple support thresholds (implemented by Azadeh Soltani).
 fixed a small bug in the redblack tree implementation used by TOPKRULES and TOPSEQRULES
 fixed a small bug in Cluster.java (thanks to F. Jafari)
 fixed a small bug in TRULEGROWTH.
 added implementations of TRULEGROWTH and BIDE+ that accepts strings instead of integers as input.
 v 0.87  20120728:
 improved the user interface so that (1) example parameter values are shown for each parameter and (2) that percentage values can be entered either in decimal format (e.g. 0.5) or as a percentage (e.g. 50%).
 fixed a bug in the hierarchical clustering algorithm in the GUI version of SPMF
 v 0.86  20120726:
 modified the user interface so that algorithms are presented by their category in the combo box such as "sequential pattern mining", "sequential rule mining", "itemset mining", "clustering", etc.
 optimized the basic Apriori implementation with binary search for checking subsets of candidates, arrays of integers instead of lists, and more.
 v 0.85  20120717:
 added several algorithms to the GUI version of SPMF: KMEANS, TWOPHASE, VME, ZART, RELIM, RULEGEN, SEQDIM, etc.
 improved the version of KMeans and the hierarchical clustering algorithm so that it can work with vectors and cleaned the code..
 added some small optimizations to the RELIM and ZART implementations,
 cleaned the implementation of the algorithm for mining pseudoclosed itemsets,
 cleaned the code of algorithms for mining multidimensional sequential patterns and modified them so that they save the results to a file.
 cleaned the source code of Aprioribased algorithms.
 v 0.84  20120715: added a few algorithms for building, updating and querying an ItemsetTree. An itemset tree is a special structure representing a database that allows efficiently generating targeted association rules, frequent itemsets and to get the support of any itemset. This structure can be updated incrementally (only available in the source code version of SPMF).
 v. 0.83  20120704: added the possibility of mining association rule with the lift measure and the minlift threshold.
 v. 0.82  20120630: fixed a bug in the SPAM implementation that occurred when minsup =0 (thanks to D. Bhatt).
 v. 0.81  20120413: improved the SPAM implementation. The number of bits by sequence is now variable. The algorithm is therefore more memory efficient and can run on larger datasets with longer sequences.
 v. 0.80  20120408: improved the user interface (thanks to Hanane Amirat), changed the license of the software to GPL v3, fixed a minor bug in the TRuleGrowth algorithm, cleaned the source code of several algorithms by removing some unused methods.
 v. 0.79  20120317: added five Aprioribased algorithms to the GUI version (Apriori, AprioriClose, AprioriRare, AprioriInverse, UApriori) and made some minor improvements.
 v. 0.78  20120305:
 Added the TRULEGROWTH for mining sequential rules with the window size constraint.
 Added the TOPKRULES algorithm for mining the topk association rules in a transaction database.
 Added the TOPSEQRULES algorithm for mining the topk sequential rules in a sequence database.
 Added an implementation of FPGROWTH that saves the result to a file instead of keeping the result into memory.
 Cleaned the implementation of PREFIXSPAN. I removed some unused variables in the pseudosequence implementation (thanks to shouwangji@___ for reporting this),
 Added an implementation of the KDTREE data structure,
 Added a simple graphical user interface (ca.pfv.spmf.gui.MainWindow) that allows to run 17 main algorithms (other algorithms will be added to the user interface later).
 v.0.77  20111028: Added an implementation of FHSAR for association rule hiding.
 v.0.76  20111022: Added faster and more memory efficient implementations of AprioriTID, ECLAT and CHARM that use bit vectors for representing tids sets.
 v.0.75  20111018: Added an implementation of INDIRECT for mining "indirect association rules".
 v.0.74  20110811: Added an implementation of SPAM for sequential pattern mining
 v.0.73  201109: cleaned the implementation of PREFIXSPAN and BIDE+ and made some optimizations, added an implementation of RULEGEN for generating sequential rules from sequential patterns.
 v.0.72  201107: Added implementations of RULEGROWTH for mining sequential rules,ID3 for creating decision trees, VME for mining erasable itemsets. Also, I cleaned a little bit the implementations of KMeans, the hierarchical clustering algorithm, AprioriTID and Apriori_TIDClose, CMRules and CMDeo.
 v.0.71  20110201: fixed a bug in the CMRULES and CMDEO algorithms.
 v 0.70  20110106: added an implementation of the HMINE algorithm for mining frequent itemsets.
 v 0.69 20101127: added two implementations of the DCI_CLOSED algorithm for mining frequent closed itemsets (one straigthforward implementation and one with optimizations).
 v 0.68  20101109: improved the performance of all Aprioribased algorithms. Also, I have added implementations of four algorithms:
 the TWOPHASE algorithm for mining highutility itemsets, Apriori_TIDClose for frequent closed itemset mining, and CMRULES and CMDEO algorithms for mining sequential rules.
 v 0.67  20101007: added an implementation of UApriori for mining frequent itemsets from uncertain data, fixed a bug in the CHARM and ECLAT algorithms
 v 0.66  20100825: added an implementation of AprioriTID and some code for generating association rules by using FPGROWTH.
 v 0.65  20100814: fixed a bug in the BIDE+ algorithm (thanks to Brock for reporting it)
 v 0.64  20100719: minor changes (add comments, fixed minor bugs, etc.)
 v 0.63  201004: minor changes (add comments, fixed minor bugs, etc.)
 v.062  20100411: fixed a bug in the triangular matrix used by the CHARM and ECLAT algorithms.
 v.061  20100320: implementation of FPGROWTH, fixed a bug in BIDE+ (thanks to G. Bruno for reporting it) and fixed a bug in RELIM
 v.060  20100315: fixed a bug in CHARM/ECLAT occurring if items are missing in the input file (thanks to A. Pardeshi for reporting it)
 v.059  20100205: fixed a bug in the BIDE+ implementation (thanks to G. Bruno for reporting it)
 v.0.58  20091215: implementations of the CHARM, CHARMMFI and ECLAT algorithms
 v.0.57  200911: implementation of the BIDE+ algorithm
 v.0.56 : 20091009: implementations of the RELIM and PREFIXSPAN algorithms
 v.0.55 : 20090801: cleaned the code and fixed some bugs
 ...
 v.050  20090531 implementation of APRIORIRARE and the APRIORI algorithm for mining association rules
 v.049  20081207: initial release.