Introduction
Algorithms
Download
Documentation
Datasets
FAQ
License
Contributors
Citations
Performance
Developers' guide
Forum
Mailinglist
Blog

We are hiring a postdoctoral researcher (details here) and a Ph.D. student. Send your CV with cover letter to Prof. FournierViger.

380252
visitors
since 2010

Algorithms
SPMF offers implementations of the following data mining
algorithms.
Sequential Pattern Mining
 algorithms for mining sequential patterns in a sequence database
 algorithms for mining closed sequential patterns in a sequence database
 algorithms for mining maximal sequential patterns in a sequence database
 algorithms for mining the topk sequential patterns in a sequence database
 algorithms for mining sequential generator patterns in a sequence database
 algorithms for mining compressing sequential patterns
 algorithms for mining multidimensional sequential patterns in a multidimensional sequence database
 the SeqDIM algorithm for mining frequent
multidimensional sequential patterns in a multidimensional sequence database (Pinto
et al., 2001)
 the Songram et al. algorithm for mining frequent
closed multidimensional sequential patterns in a multidimensional
sequence database (Songram
et al. 2006)
 the FournierViger et al. algorithm, a
sequential pattern mining algorithm that combines several
features from wellknown sequential pattern mining algorithms and also
proposes some original features (FournierViger
et al., 2008):
 algorithm for mining highutility sequential patterns in a sequence database
 the USPAN algorithm (Yin et al. 2012)
For a good overview of sequential pattern mining algorithms, you may read this survey paper.
Sequential Rule Mining
 algorithms for mining sequential rules in a sequence database
 algorithms for mining sequential rules in a sequence database with the window size
constraint
 algorithms for mining topk sequential rules in a sequence database
 algorithm for mining highutility sequential rules in a sequence database
Sequence Prediction
 algorithms for predicting the next symbol of a sequence based on a set of training sequences
Frequent Itemset Mining
 algorithms for discovering frequent itemsets in a transaction database.
 the Apriori algorithm (Agrawal & Srikant, 1994)
 the AprioriTID algorithm (Agrawal & Srikant, 1994)
 the FPGrowth algorithm (Han
et al., 2004)
 the Eclat algorithm (Zaki,
2000)
 the dEclat algorithm (Zaki
and Gouda, 2001, 2003)
 the Relim algorithm (Borgelt,
2005)
 the HMine algorithm (Pei
et al., 2007)
 the LCMFreq algorithm (Uno et al., 2004)
 the PrePost and PrePost+ algorithms (Deng et al., 2012, Deng et Lv, 2015)
 the FIN algorithm (Deng et al., 2014)
 algorithms for discovering frequent closed itemsets in a transaction database.
 algorithms for discovering frequent maximal itemsets in a transaction database.
 the FPMax algorithm (Grahne and Zhu, 2003)
 the CharmMFI algorithm for discovering frequent
closed itemsets and maximal frequent itemsets by postprocessing in a transaction database (Szathmary et al. 2006)
 algorithms for mining frequent itemsets with multiple minimum supports
 algorithms for mining generator itemsets in a transaction database
 the DefMe algorithm for mining frequent generator itemsets in a transaction database (Soulet & Rioult, 2014)
 the Pascal algorithm for mining frequent itemsets, and identifying at the same time which one are generators (Bastide et al., 2002)
 the Zart algorithm for discovering frequent
closed itemsets and their generators in a transaction database (Szathmary et
al. 2007)
 algorithms for mining rare itemsets
and/or correlated itemsets in a transaction database
 algorithms for performing targeted and dynamic queries about association
rules and frequent itemsets.
 the ItemsetTree, a data structure that can be updated incrementally, and algorithms for querying it. (Kubat
et al, 2003)
 the MemoryEfficient ItemsetTree, a data structure that can be updated incrementally, and algorithms for querying it. (FournierViger, 2013, powerpoint)
 algorithms to discover frequent itemsets in a stream
 the estDec algorithm for mining recent frequent itemsets in a data stream (Chang & Lee, 2003)
 the estDec+ algorithm for mining recent frequent itemsets in a data stream (Shin et al., 2014)
 the CloStream algorithm for mining frequent
closed itemsets in a data stream (Yen
et al, 2009)
 the UApriori algorithm for mining frequent
itemsets from uncertain data (Chui
et al, 2007)
 the VME algorithm for mining erasable
itemsets (Deng & Xu,
2010)
Periodic Pattern Mining
 the PFPM algorithm (FournierViger et al, 2016a) for mining frequent periodic patterns in a sequence of transactions (a transaction database))
 the PHM algorithm (FournierViger et al, 2016b) for mining periodic highutility patterns (periodic patterns that yield a high profit) in a sequence of transactions (a transaction database) containing utility information
HighUtility Pattern Mining
 algorithms for mining highutility itemsets in a transaction database having profit information
 the EFIM algorithm (Zida et al. 2016, Zida et al., 2015)
 the FHM algorithm (FournierViger et al., 2014)
 the HUIMiner algorithm (Liu & Qu, 2012)
 the HUPMiner algorithm (Krishnamoorthy, 2014)
 the IHUP algorithm (Ahmed et al., 2009)
 the TwoPhase algorithm (Liu
et al., 2005)
 the UPGrowth (Tseng et al., 2011)
 the UPGrowth+ algorithm (Tseng et al., 2013)
 the d2HUP algorithm (Liu & al, 2012)
 algorithm for efficiently mining highutility itemsets with length constraints in a transaction database
 algorithm for mining correlated highutility itemsets in a transaction database
 algorithm for mining highutility itemsets in a transaction database containing negative unit profit values
 algorithm for mining frequent highutility itemsets in a transaction database
 algorithm for mining onshelf highutility itemsets in a transaction database containing information about time periods of items
 algorithm for incremental highutility itemset mining
 algorithm for mining concise representations of highutility itemsets in a transaction database
 algorithm for mining the skyline highutility itemsets
 algorithm for mining frequent skyline utility patterns
 algorithm for mining highutility sequential rules in a sequence database
 algorithm for mining highutility sequential patterns in a sequence database
 the USPAN algorithm (Yin et al. 2012)
 algorithm for mining highutility itemsets using genetic algorithms
 algorithm for mining highutility itemsets using particleswarm optimization
Association Rule Mining
 an algorithm for mining all association rules in a transaction database (Agrawal &
Srikant, 1994)
 an algorithm for mining all association rules with
the lift measure in a transaction database (adapted from Agrawal & Srikant, 1994)
 an algorithm for mining the IGB informative and generic basis
of association rules in a transaction database (Gasmi et al., 2005)
 an algorithm for mining perfectly sporadic association rules
(Koh & Roundtree, 2005)
 an algorithm for mining closed association rules
(Szathmary et al. 2006).
 an algorithm for mining minimal non redundant association
rules (Kryszkiewicz,
1998)
 the Indirect algorithm for mining indirect
association rules (Tan et al. 2000; Tan et 2006)
 the FHSAR algorithm for hiding sensitive
association rules (Weng et
al. 2008)
 the TopKRules algorithm for mining the topk
association rules (FournierViger,
2012b, powerpoint)
 the TNR algorithm for mining topk nonredundant
association rules (FournierViger
2012d, powerpoint)
Clustering
 the original KMeans algorithm (MacQueen, 1967)
 the Bisecting KMeans algorithm (Steinbach et al, 2000)
 algorithms for densitybased clustering
 the DBScan algorithm (Ester et al., 1996)
 the Optics algorithm to extract a cluster ordering of points, which can then be use to generate DBScan style clusters and more (Ankerst et al, 1999)
 a hierarchical clustering algorithm
 a tool called Cluster Viewer for visualizing clusters
 a tool called Instance Viewer for visualizing the input of clustering algorithms
Time series mining
 an algorithm for calculating the moving average of a time series (to remove noise)
 an algorithm for calculating the piecewise aggregate approximation of a time series (to reduce the number of data points of a time series)
 an algorithm for splitting time series into segments of a given length
 an algorithm for splitting time series into a given number of segments
 the SAX algorithm for converting a timeseries to a sequence database (to apply traditional algorihtms for sequential rule mining and sequential pattern mining on time series) (Lin et al., 2007)
 a tool called Time Series Viewer for visualizing time series
 Besides, it is also possible to apply most clustering algorithms offered in SPMF (KMeans, Bisecting KMeans, DBScan, OPTICS, Hierarchical clustering) on time series.
Classification
 the ID3 algorithm for building decision trees
(Quinlan, 1986)
Text mining
 an algorithm for classifying text documents using a Naive Bayes classifier approach (S. Raghu, 2015)
 an algorithm for clustering texts using the tf*idf measure (S. Raghu, 2015)
Data structures
 redblack tree,
 itemsettree,
 binary tree,
 KDtree,
 triangular matrix.
Tools
 A tool for generating a synthetic transaction database
 A tool for generating a synthetic sequence database
 A tool for generating a synthetic sequence database with timestamps
 A tool for calculating statistics about a transaction database
 A tool for calculating statistics about a sequence database
 A tool for converting a sequence database to a transaction database
 A tool for converting a transaction database to a sequence database
 A tool for converting a text file to a sequence database (each sentences becomes a sequence)
 A
tool for converting a sequence database in various formats (CSV,
KOSARAK, BMS, IBM...) to a sequence database in SPMF format
 A tool for converting a transaction database in various formats (CSV...) to a transaction database in SPMF format
 A tool for converting timeseries to a sequence database
 A tool to generate utility values for a transaction database
 A tool to add timestamps to a sequence database
 A tool for removing utility information from a database having utility information
 A tool to resize a database in SPMF format (a text file) using a percentage of lines of data from an original database.
 A tool for visualizing timeseries
Visual map of algorithms
You can visualize the relationship between the various data mining algorithms offered in SPMF by clicking on this map (last updated : 2015/09/12  SPMF 0.97):
