Introduction
Algorithms
Download
Documentation
Datasets
FAQ
License
Contributors
Citations
Performance
Developers' guide
Videos
Forum
Blog
Other resources
814983
visitors
since 2010

* We are hiring a postdoctoral researcher (details here) and a Ph.D. student. Send your CV with cover letter to Prof. FournierViger.


Algorithms
SPMF offers implementations of the following data mining
algorithms.
Sequential Pattern Mining
These algorithms discover sequential patterns in a set of sequences. For a good overview of sequential pattern mining algorithms, please read this survey paper.
 algorithms for mining sequential patterns in a sequence database
 the CMSPADE algorithm (FournierViger et al, 2014, powerpoint)
 the CMSPAM algorithm (FournierViger et al, 2014, powerpoint)
 the FAST algorithm (Salvemini et al, 2011)
 the GSP algorithm (Srikant et al., 1996)
 the LAPIN (aka LAPINSPAM) algorithm (Yang et al., 2005)
 the PrefixSpan algorithm (Pei et al., 2004)
 the SPADE algorithm (Zaki et al., 2001)
 the SPAM algorithm (Ayres et al., 2002)
 algorithms for mining closed sequential patterns in a sequence database
 algorithms for mining maximal sequential patterns in a sequence database
 algorithms for mining the topk sequential patterns in a sequence database
 algorithms for mining sequential generator patterns in a sequence database
 algorithms for mining compressing sequential patterns
 algorithms for mining multidimensional sequential patterns in a multidimensional sequence database
 the SeqDIM algorithm for mining frequent
multidimensional sequential patterns in a multidimensional sequence database (Pinto
et al., 2001)
 the Songram et al. algorithm for mining frequent
closed multidimensional sequential patterns in a multidimensional
sequence database (Songram
et al. 2006)
 the FournierViger et al. algorithm, a
sequential pattern mining algorithm that combines several
features from wellknown sequential pattern mining algorithms and also
proposes some original features (FournierViger
et al., 2008):
 algorithm for mining highutility sequential patterns in a sequence database
 the USPAN algorithm (Yin et al. 2012)
 algorithm for mining highutility probability sequential patterns in a sequence database
 algorithm for progressive sequential pattern mining with convergence guarantees
 the Occur algorithm for finding all occurrences of some sequential patterns in sequences by postprocessing.
Sequential Rule Mining
These algorithms discover sequential rules in a set of sequences.
 algorithms for mining sequential rules in a sequence database
 the ERMiner algorithm (FournierViger et al., 2014)
 the RuleGrowth algorithm (FournierViger
et al., 2011, FournierViger et al., 2015, powerpoint, video)
 the CMRules algorithm (FournierViger
et al., 2010, powerpoint)
 the CMDeo algorithm (FournierViger
et al., 2010)
 the RuleGen algorithm (Zaki et al, 2001)
 algorithms for mining sequential rules in a sequence database with the window size
constraint
 algorithms for mining topk sequential rules in a sequence database
 algorithm for mining highutility sequential rules in a sequence database
Sequence Prediction
These algorithms predict the next symbol(s) of a sequence based on a set of training sequences
 algorithms for predicting the next symbol of a sequence based on a set of training sequences
Itemset Mining
These algorithms discover interesting itemsets (sets of values) that appear in a transaction database (database records containing symbolic data). For a good overview of itemset mining, please read this survey paper.
 algorithms for discovering frequent itemsets in a transaction database.
 the Apriori algorithm (Agrawal & Srikant, 1994, video
 the AprioriTID algorithm (Agrawal & Srikant, 1994)
 the FPGrowth algorithm (Han
et al., 2004)
 the Eclat algorithm (Zaki,
2000)
 the dEclat algorithm (Zaki
and Gouda, 2001, 2003)
 the Relim algorithm (Borgelt,
2005)
 the HMine algorithm (Pei
et al., 2007)
 the LCMFreq algorithm (Uno et al., 2004)
 the PrePost and PrePost+ algorithms (Deng et al., 2012, Deng et Lv, 2015)
 the FIN algorithm (Deng et al., 2014)
 the DFIN algorithm (Deng et al., 2016)
 the NegFIN algoritm (Aryabarzan et al., 2018)
 algorithms for discovering frequent closed itemsets in a transaction database.
 the FPClose algorithm (Grahne and Zhu, 2005)
 the Charm algorithm (Zaki
and Hsiao, 2002)
 the dCharm algorithm (Zaki
and Gouda, 2001)
 the DCI_Closed algorithm (Lucchese
et al, 2004)
 the LCM algorithm (Uno et al., 2004)
 the AprioriClose aka Close
algorithm (Pasquier et al., 1999)
 the AprioriTID Close algorithm (Pasquier
et al., 1999, Agrawal
& Srikant, 1994)
 the NAFCP algorithm (Le et al., 2015)
 algorithms for recovering all frequent itemsets from frequent closed itemsets:
 the LevelWise algorithm (Pasquier et al., 1999)
 the DFIGrowth algorithm (Huang et al., 2019)
 algorithms for discovering frequent maximal itemsets in a transaction database.
 the FPMax algorithm (Grahne and Zhu, 2003)
 the CharmMFI algorithm for discovering frequent
closed itemsets and maximal frequent itemsets by postprocessing in a transaction database (Szathmary et al. 2006)
 algorithms for mining frequent itemsets with multiple minimum supports
 algorithms for mining generator itemsets in a transaction database
 the DefMe algorithm for mining frequent generator itemsets in a transaction database (Soulet & Rioult, 2014)
 the Pascal algorithm for mining frequent itemsets, and identifying at the same time which one are generators (Bastide et al., 2002)
 the Zart algorithm for discovering frequent
closed itemsets and their generators in a transaction database (Szathmary et
al. 2007)
 algorithms for mining rare itemsets
and/or correlated itemsets in a transaction database
 the AprioriInverse algorithm for mining perfectly
rare itemsets (Koh
& Roundtree, 2005)
 the AprioriRare algorithm for mining minimal
rare itemsets and frequent itemsets (Szathmary et
al. 2007b)
 the CORI algorithm for mining minimal
rare correlated itemsets using the support and bond measures (Bouasker et
al. 2015)
 the RPGrowth algorithm for mining rare itemsets (Tsang et al., 2011)
 algorithms for performing targeted and dynamic queries about association
rules and frequent itemsets.
 the ItemsetTree, a data structure that can be updated incrementally, and algorithms for querying it. (Kubat
et al, 2003)
 the MemoryEfficient ItemsetTree, a data structure that can be updated incrementally, and algorithms for querying it. (FournierViger, 2013, powerpoint)
 algorithms to discover frequent itemsets in a stream
 the estDec algorithm for mining recent frequent itemsets in a data stream (Chang & Lee, 2003)
 the estDec+ algorithm for mining recent frequent itemsets in a data stream (Shin et al., 2014)
 the CloStream algorithm for mining frequent
closed itemsets in a data stream (Yen
et al, 2009)
 the UApriori algorithm for mining frequent
itemsets in uncertain data (Chui
et al, 2007)
 the VME algorithm for mining erasable
itemsets (Deng & Xu,
2010)
 algorithms to discover fuzzy frequent itemsets in a quantitative transaction database
 the OPUSMiner algorithm for mining selfsufficient itemsets (Webb et al., 2014)
Periodic Pattern Mining
These algorithms discover patterns that periodically appear in a sequence of complex events (also called a transaction database)
These algorithms discover patterns that periodically appear in multiple sequences of events
 the MPFPS_BFS algorithms (FournierViger, P., Li, Z., et al., 2019, powerpoint) for mining periodic patterns that are common to multiple sequences
 the MPFPS_DFS algorithms (FournierViger, P., Li, Z., et al., 2019, powerpoint) for mining periodic patterns that are common to multiple sequences
Episode Mining
These algorithms discover patterns (episodes) that appear in a single sequence of events.
 the TUP algorithm (Rathore et al., 2016) for mining the topk high utility episodes in a sequence of complex events (a transaction database) with utility information
 the USSPAN algorithm (Wu et al., 2013 ) for mining high utility episodes in a sequence of complex events (a transaction database) with utility information
Graph Mining
 the TKG algorithm for mining the topk frequent subgraphs in a graph database (FournierViger, 2019)
 the gSpan algorithm for mining the frequent subgraphs in a graph database (Yan et al., 2002)
HighUtility Pattern Mining
These algorithms discover patterns having a high utility (importance) in different kinds of data. For a good overview of high utility itemset mining, you may read this survey paper, and the high utilitypattern mining book.
 algorithms for mining highutility itemsets in a transaction database having profit information
 the EFIM algorithm (Zida et al. 2016, Zida et al., 2015, powerpoint)
 the FHM algorithm (FournierViger et al., 2014, powerpoint)
 the HUIMiner algorithm (Liu & Qu, 2012)
 the HUPMiner algorithm (Krishnamoorthy, 2014)
 the mHUIMiner algorithm (Peng et al., 2017)
 the HMiner algorithm (Krishnamoorty, 2017)
 the ULBMiner algorithm (Duong et al, 2018)
 the UFH algorithm (Dawar et al, 2017)
 the IHUP algorithm (Ahmed et al., 2009)
 the TwoPhase algorithm (Liu
et al., 2005)
 the UPGrowth algorithm (Tseng et al., 2011)
 the UPGrowth+ algorithm (Tseng et al., 2013)
 the UPHist algorithm (Dawar et al., 2015)
 the d2HUP algorithm (Liu et al, 2012)
 algorithm for efficiently mining highutility itemsets with length constraints in a transaction database
 algorithm for mining correlated highutility itemsets in a transaction database
 the FCHM_bond algorithm, to use the bond measure (FournierViger et al, 2016, powerpoint, FournierViger 2018 et al., to appear, video )
 the FCHM_allconfidence algorithm, to use the allconfidence measure (FournierViger et al, 2016, powerpoint, FournierViger 2018 et al., to appear)
 algorithm for mining highutility itemsets in a transaction database containing negative unit profit values
 algorithm for mining frequent highutility itemsets in a transaction database
 algorithm for mining onshelf highutility itemsets in a transaction database containing information about time periods of items
 algorithm for incremental highutility itemset mining in a
transaction database
 algorithm for mining concise representations of highutility itemsets in a transaction database
 the HUGMiner algorithm (FournierViger et al., 2014, powerpoint) for mining highutility generators
 the GHUIMiner algorithm (FournierViger et al., 2014, powerpoint) for mining generators of highutility itemsets
 the MinFHM algorithm (FournierViger et al., 2016, powerpoint, video ) for mining minimal highutility itemsets
 the EFIMClosed algorithm (FournierViger et al., 2016, powerpoint) for mining closed highutility itemsets
 the CHUIMiner algorithm (Wu et al., 2015) for mining closed highutility itemsets
 the CHUD algorithm for mining closed highutility itemsets (Tseng et al., 2011/2015)
 the CHUIMiner(Max) algorithm for mining maximal high utility itemsets (Wu et al., 2019).
 algorithm for mining the skyline highutility itemsets
in a
transaction database
 algorithm for mining the topk highutility itemsets in a
transaction database
 algorithms for mining the topk high utility itemsets from a data stream with a window
 algorithm for mining frequent skyline utility patterns
in a
transaction database
 algorithm for mining quantitative high utility itemsets in a transaction database:
 algorithm for mining highutility sequential rules in a sequence database
 algorithm for mining highutility sequential patterns in a sequence database
 the USPAN algorithm (Yin et al. 2012)
 algorithm for mining highutility probability sequential patterns in a sequence database
 algorithm for mining highutility itemsets in a
transaction database using evolutionary algorithms
 algorithm for mining high averageutility itemsets in a
transaction database
 the HAUIMiner algorithm for mining high averageutility itemsets (Lin et al, 2016)
 the EHAUPM algorithm for mining high averageutility itemsets (Lin et al, 2017)
 the HAUIMMAU algorithm for mining high averageutility itemsets with multiple thresholds (Lin et al, 2016)
 the MEMU algorithm for mining high averageutility itemsets with multiple thresholds (Lin et al, 2018)
 algorithms for mining high utility episodes in a sequence of complex events (a transaction database)
 the TUP algorithm (Rathore et al., 2016) for mining frequent periodic patterns in a sequence of transactions (a transaction database))
 the UPSPAN algorithm (Wu et al., 2013 ) for mining periodic highutility patterns (periodic patterns that yield a high profit) in a sequence of transactions (a transaction database) containing utility information
 algorithms
for mining periodic highutility patterns (periodic patterns that yield a high profit) in a sequence of transactions (a transaction database) containing utility information
 algorithms for discovering irregular high utility itemsets (non periodic patterns) in a transaction database with utility information
 the PHM_irregular algorithm, which is a simple variation of the PHM algorithm
 algorithm for discovering local high utility itemsets in
a database with utility information and timestamps
 algorithm for discovering peak high utility itemsets in
a database with utility information and timestamps
Association Rule Mining
These algorithms discover interesting associations between symbols (values) in a transaction database (database records with binary attributes).
 an algorithm for mining all association rules in a transaction database (Agrawal &
Srikant, 1994)
 an algorithm for mining all association rules with
the lift measure in a transaction database (adapted from Agrawal & Srikant, 1994)
 an algorithm for mining the IGB informative and generic basis
of association rules in a transaction database (Gasmi et al., 2005)
 an algorithm for mining perfectly sporadic association rules
(Koh & Roundtree, 2005)
 an algorithm for mining closed association rules
(Szathmary et al. 2006).
 an algorithm for mining minimal non redundant association
rules (Kryszkiewicz,
1998)
 the Indirect algorithm for mining indirect
association rules (Tan et al. 2000; Tan et 2006)
 the FHSAR algorithm for hiding sensitive
association rules (Weng et
al. 2008)
 the TopKRules algorithm for mining the topk
association rules (FournierViger,
2012b, powerpoint)
 the TopKClassRules algorithm for mining the topk
class association rules (a variation of TopKRules. This latter is described in FournierViger,
2012b, powerpoint)
 the TNR algorithm for mining topk nonredundant
association rules (FournierViger
2012d, powerpoint)
Stream mining
These algorithms discovers various kinds of patterns in a stream (an infinite sequence of database records (transactions))
 the estDec algorithm for mining recent frequent itemsets in a data stream (Chang & Lee, 2003)
 the estDec+ algorithm for mining recent frequent itemsets in a data stream (Shin et al., 2014)
 the CloStream algorithm for mining frequent
closed itemsets in a data stream (Yen
et al, 2009)
 algorithms for mining the topk high utility itemsets from a data stream with a window
Clustering
These algorithms automatically find clusters in different kinds of data
 the original KMeans algorithm (MacQueen, 1967)
 the Bisecting KMeans algorithm (Steinbach et al, 2000)
 algorithms for densitybased clustering
 the DBScan algorithm (Ester et al., 1996)
 the Optics algorithm to extract a cluster ordering of points, which can then be use to generate DBScan style clusters and more (Ankerst et al, 1999)
 a hierarchical clustering algorithm
 a tool called Cluster Viewer for visualizing clusters
 a tool called Instance Viewer for visualizing the input of clustering algorithms
Time series mining
These algorithms perform various tasks to analyze time series data
 an algorithm for converting a time series to a sequence of symbols using the SAX representation of time series. Note that if one converts a set of time series with SAX, he will obtain a sequence database, which allows to then apply traditional algorihtms for sequential rule mining and sequential pattern mining on time series (SAX, 2007).
 algorithms for calculating the prior moving average of a time series (to remove noise)
 algorithms for calculating the cumulative moving average f a time series (to remove noise)
 algorithms for calculating the central moving average of a time series (to remove noise)
 an algorithm for calculating the median smoothing of a time series (to remove noise)
 an algorithm for calculating the exponential smoothing of a time series (to remove noise)
 an algorithm for calculating the min max normalization of a time series
 an algorithm for calculating the autocorrelation function of a time series
 an algorithm for calculating the standardization of a time series
 an algorithm for calculating the first and second order differencing of a time series
 an algorithm for calculating the piecewise aggregate approximation of a time series (to reduce the number of data points of a time series)
 an algorithm for calculating the linear regression of a time series (using the least squares method)
 an algorithm for splitting a time series into segments of a given length
 an algorithm for splitting a time series into a given number of segments
 algorithms to cluster time series (group timeseries according to their similarities). This can be done by applying the clustering algorithms offered in SPMF (KMeans, Bisecting KMeans, DBScan, OPTICS, Hierarchical clustering) on time series.
 a tool called Time Series Viewer for visualizing time series
Classification
 the ID3 algorithm for building decision trees
(Quinlan, 1986)
Text mining
 an algorithm for classifying text documents using a Naive Bayes classifier approach (S. Raghu, 2015)
 an algorithm for clustering texts using the tf*idf measure (S. Raghu, 2015)
Data structures
 redblack tree,
 itemsettree,
 binary tree,
 KDtree,
 triangular matrix.
Tools
 A tool for generating a synthetic transaction database
 A tool for generating a synthetic sequence database
 A tool for generating a synthetic sequence database with timestamps
 A tool for calculating statistics about a transaction database
 A tool for calculating statistics about a transaction database with utility information
 A tool for calculating statistics about a sequence database
 A tool for converting a sequence database to a transaction database
 A tool for converting a transaction database to a sequence database
 A tool for converting a text file to a sequence database (each sentences becomes a sequence)
 A
tool for converting a sequence database in various formats (CSV,
KOSARAK, BMS, IBM...) to a sequence database in SPMF format
 A tool for converting a transaction database in various formats (CSV...) to a transaction database in SPMF format
 A tool for converting timeseries to a sequence database
 A tool to generate utility values for a transaction database
 A tool to add timestamps to a sequence database
 A tool for removing utility information from a database having utility information
 A tool to resize a database in SPMF format (a text file) using a percentage of lines of data from an original database.
 A tool for visualizing timeseries
Visual map of algorithms
You can visualize the relationship between the various data mining algorithms offered in SPMF by clicking on this map (last updated : 2015/09/12  SPMF 0.97):
