Algorithms
SPMF offers implementations of the following data mining algorithms.
Sequential Pattern Mining
These algorithms discover sequential patterns in a set of sequences. For a good overview of sequential pattern mining algorithms, please read this survey paper.
- algorithms for mining sequential patterns in a sequence database
- the CM-SPADE algorithm (Fournier-Viger et al, 2014, powerpoint)
- the CM-SPAM algorithm (Fournier-Viger et al, 2014, powerpoint)
- the FAST algorithm (Salvemini et al, 2011)
- the GSP algorithm (Srikant et al., 1996)
- the LAPIN (aka LAPIN-SPAM) algorithm (Yang et al., 2005)
- the PrefixSpan algorithm (Pei et al., 2004)
- the SPADE algorithm (Zaki et al., 2001)
- the SPAM algorithm (Ayres et al., 2002)
- algorithms for mining closed sequential patterns in a sequence database
- the ClaSP algorithm (Gomariz et al., 2013)
- the CM-ClaSP algorithm (Fournier-Viger et al, 2014, powerpoint)
- the CloFAST algorithm (Fumarola et al, 2016)
- the CloSpan algorithm (Yan et al., 2003)
- the BIDE+ algorithm(Wang et al., 2007)
- algorithms for mining maximal sequential patterns in a sequence database
- the VMSP algorithm (Fournier-Viger et al, 2014, powerpoint)
- the MaxSP algorithm (Fournier-Viger et al., 2013, powerpoint).
- algorithms for mining the top-k sequential patterns in a sequence database
- the TKS algorithm (Fournier-Viger et al., 2013, powerpoint).
- the TSP algorithm (Tzvetkoz et al., 2003).
- the Skopus algorithm for mining the top-k sequential patterns using leverage and significance (Petijean et al., 2016)
- algorithms for mining sequential generator patterns in a sequence database
- the VGEN algorithm (Fournier-Viger et al, 2014)
- the FEAT algorithm (Gao et al., 2008).
- the FSGP algorithm (Yi et al., 2011).
- algorithms for mining compressing sequential patterns
- the GoKrimp and SeqKrimp algorithms (Lam et al., 2012; Lam et al., 2014)
- algorithm for identifying the top-k quantile based cohesive sequential patterns
in a single sequence or in multiple sequences
- the QCSP algorithm (Feremans et al., 2019)
- algorithms for mining multidimensional sequential patterns in a multidimensional sequence database
- the SeqDIM algorithm for mining frequent multidimensional sequential patterns in a multi-dimensional sequence database (Pinto et al., 2001)
- the Songram et al. algorithm for mining frequent closed multidimensional sequential patterns in a multi-dimensional sequence database (Songram et al. 2006)
- the Fournier-Viger et al. algorithm, a
sequential pattern mining algorithm that combines several
features from well-known sequential pattern mining algorithms and also
proposes some original features (Fournier-Viger
et al., 2008):
- mining sequences with minimum support by database-projection (based on PrefixSpan, Pei et al., 2004)
- mining sequences with min/max time interval between events and min/max time length of a sequence (based on Hirate-Yamana, 2006)
- mining closed sequences (based on the BIDE+ algorithm by Wang et al. 2007)
- mining multi-dimensional sequences (based on Pinto et al. 2001)
- mining closed multi-dimensional sequences (based on Songram et al. 2006 and Pasquier et al., 1999)
- mining sequences with items having integer values and performing automatic clustering of these values (original extension described in Fournier-Viger et al., 2008)
- algorithm for mining high-utility sequential patterns in a sequence database
- the USPAN algorithm (Yin et al. 2012)
- algorithm for mining cost-efficient sequential patterns (a.k.a. low-cost high utility sequential patterns)
- the CorCEPB algorithm for mining cost-efficient patterns in sequences with binary utility information and cost values (Fournier-Viger et al., 2020, ppt , video )
- the CEPB algorithm for mining cost-efficient patterns in sequences with binary utility information and cost values - consider only sequence with positive utility(Fournier-Viger et al., 2020 , ppt, video )
- the CEPN algorithm for mining cost-efficient patterns in sequences with numeric utility information and cost values (Fournier-Viger et al., 2020, ppt, video )
- algorithm for mining high-utility probability sequential patterns in a sequence database
- the PHUSPM algorithm (Zhang et al. 2018)
- the UHUSPM algorithm (Zhang et al. 2018)
- algorithm for progressive sequential pattern mining with convergence guarantees
- the ProSecCo algorithm (Servan-Schreiber et al. 2018)
- the Occur algorithm for finding all occurrences of some sequential patterns in sequences by post-processing.
Sequential Rule Mining
These algorithms discover sequential rules in a set of sequences.
- algorithms for mining sequential rules in a sequence database
- the ERMiner algorithm (Fournier-Viger et al., 2014)
- the RuleGrowth algorithm (Fournier-Viger et al., 2011, Fournier-Viger et al., 2015, powerpoint, video)
- the CMRules algorithm (Fournier-Viger et al., 2010, powerpoint)
- the CMDeo algorithm (Fournier-Viger et al., 2010)
- the RuleGen algorithm (Zaki et al, 2001)
- algorithms for mining sequential rules in a sequence database with the window size
constraint
- the TRuleGrowth algorithm (Fournier-Viger, 2012a, Fournier-Viger et al., 2015)
- algorithms for mining top-k sequential rules in a sequence database
- the TopSeqRules algorithm for mining the top-k sequential rules (Fournier-Viger et al., 2011, powerpoint)
- the TopSeqClassRules algorithm for mining the top-k class sequential rules (a variation of Fournier-Viger et al., 2011)
- the TNS algorithm for mining the top-k non-redundant sequential rules (Fournier-Viger 2013)
- algorithm for mining high-utility sequential rules in a sequence database
- the HUSRM algorithm (Zida et al., 2015)
Sequence Prediction
These algorithms predict the next symbol(s) of a sequence based on a set of training sequences
- algorithms for predicting the next symbol of a sequence based on a set of training sequences
- the Compact Prediction Tree+ (CPT+) algorithm (Gueniche et al., 2015, powerpoint, )
- the Compact Prediction Tree (CPT) algorithm (Gueniche et al., 2013, )
- the First order Markov Chains (PPM - order 1) (Clearly et al, 1984)
- the Dependency Graph (DG) (Padmanabhan, 1996)
- the All-k-Order Markov Chains (AKOM) (Pitkow, 1999)
- the TDAG (Laird & Saul, 1994)
- the LZ78 (Ziv, 1978)
Itemset Mining
These algorithms discover interesting itemsets (sets of values) that appear in a transaction database (database records containing symbolic data). For a good overview of itemset mining, please read this survey paper.
- algorithms for discovering frequent itemsets in a transaction database.
- the Apriori algorithm (Agrawal & Srikant, 1994, video
- the AprioriTID algorithm (Agrawal & Srikant, 1994)
- the FP-Growth algorithm (Han et al., 2004)
- the Eclat algorithm (Zaki, 2000)
- the dEclat algorithm (Zaki and Gouda, 2001, 2003)
- the Relim algorithm (Borgelt, 2005)
- the H-Mine algorithm (Pei et al., 2007)
- the LCMFreq algorithm (Uno et al., 2004)
- the PrePost and PrePost+ algorithms (Deng et al., 2012, Deng et Lv, 2015)
- the FIN algorithm (Deng et al., 2014)
- the DFIN algorithm (Deng et al., 2016)
- the NegFIN algoritm (Aryabarzan et al., 2018)
- algorithms for discovering frequent closed itemsets in a transaction database.
- the FPClose algorithm (Grahne and Zhu, 2005)
- the Charm algorithm (Zaki and Hsiao, 2002)
- the dCharm algorithm (Zaki and Gouda, 2001)
- the DCI_Closed algorithm (Lucchese et al, 2004)
- the LCM algorithm (Uno et al., 2004)
- the AprioriClose aka Close algorithm (Pasquier et al., 1999)
- the AprioriTID Close algorithm (Pasquier et al., 1999, Agrawal & Srikant, 1994)
- the NAFCP algorithm (Le et al., 2015)
- algorithms for recovering all frequent itemsets from frequent closed itemsets:
- the LevelWise algorithm (Pasquier et al., 1999)
- the DFI-Growth algorithm (Huang et al., 2019)
- algorithms for discovering frequent maximal itemsets in a transaction database.
- the FPMax algorithm (Grahne and Zhu, 2003)
- the Charm-MFI algorithm for discovering frequent closed itemsets and maximal frequent itemsets by post-processing in a transaction database (Szathmary et al. 2006)
- algorithms for mining frequent itemsets with multiple minimum supports
- the MSApriori algorithm (Liu et al, 1999)
- the CFPGrowth++ algorithm (Uday & Reddy, 2011, Hu & Chen, 2006)
- algorithms for mining generator itemsets in a transaction database
- the DefMe algorithm for mining frequent generator itemsets in a transaction database (Soulet & Rioult, 2014)
- the Pascal algorithm for mining frequent itemsets, and identifying at the same time which one are generators (Bastide et al., 2002)
- the Zart algorithm for discovering frequent closed itemsets and their generators in a transaction database (Szathmary et al. 2007)
- algorithms for mining rare itemsets
and/or correlated itemsets in a transaction database
- the AprioriInverse algorithm for mining perfectly rare itemsets (Koh & Roundtree, 2005)
- the AprioriRare algorithm for mining minimal rare itemsets and frequent itemsets (Szathmary et al. 2007b)
- the CORI algorithm for mining minimal rare correlated itemsets using the support and bond measures (Bouasker et al. 2015)
- the RP-Growth algorithm for mining rare itemsets (Tsang et al., 2011)
- algorithms for performing targeted and dynamic queries about association
rules and frequent itemsets.
- the Itemset-Tree, a data structure that can be updated incrementally, and algorithms for querying it. (Kubat et al, 2003)
- the Memory-Efficient Itemset-Tree, a data structure that can be updated incrementally, and algorithms for querying it. (Fournier-Viger, 2013, powerpoint)
- algorithms to discover frequent itemsets in a stream
- the estDec algorithm for mining recent frequent itemsets in a data stream (Chang & Lee, 2003)
- the estDec+ algorithm for mining recent frequent itemsets in a data stream (Shin et al., 2014)
- the CloStream algorithm for mining frequent closed itemsets in a data stream (Yen et al, 2009)
- the U-Apriori algorithm for mining frequent itemsets in uncertain data (Chui et al, 2007)
- the VME algorithm for mining erasable itemsets (Deng & Xu, 2010)
- algorithms to discover fuzzy frequent itemsets in a quantitative transaction database
- the FFI-Miner algorithm for mining fuzzy itemsets (Lin et al., 2015)
- the MFFI-Miner algorithm for mining multiple fuzzy itemsets (Lin et al., 2016)
- the OPUS-Miner algorithm for mining self-sufficient itemsets (Webb et al., 2014)
Episode Mining
These algorithms discover patterns (episodes) that appear in a single sequence of events.
- algorithms for mining frequent episodes
- the TKE algorithm, which finds the top-k most frequent episodes based on the head frequency (Fournier-Viger et al., 2020)
- the EMMA algorithm, which counts the support based on the head frequency (Kuo-Yu et al., 2008)
- the MINEPI+ algorithm,which counts the support based on the head frequency (Kuo-Yu et al., 2008)
- the MINEPI algorithm, which counts the support based on minimal occurrences, and does not allow simultaneous events (Mannila & Toivonen, 1997)
- algorithms for mining high utility episodes in a sequence of complex events (a transaction database) with utility information
- the HUE-SPAN algorithm (Fournier-Viger et al., 2019, powerpoint) for mining high utility episodes in a sequence of complex events (a transaction database) with utility information
- the US-SPAN algorithm (Wu et al., 2013 ) for mining high utility episodes in a sequence of complex events (a transaction database) with utility information
- the TUP algorithm (Rathore et al., 2016) for mining the top-k high utility episodes in a sequence of complex events (a transaction database) with utility information
Periodic Pattern Mining
These algorithms discover patterns that periodically appear in the data
- Algorithms for finding periodic patterns in a single sequence of complex events (also called a transaction database)
- the SPP-Growth algorithm (Fournier-Viger et al. 2019, powerpoint, video ) for mining stable periodic patterns in a transaction database with or without timestamps
- the PFPM algorithm (Fournier-Viger et al, 2016a, powerpoint, video ) for mining frequent periodic patterns in a sequence of transactions (a transaction database))
- the PHM algorithm (Fournier-Viger et al, 2016b, powerpoint) for mining periodic high-utility patterns (periodic patterns that yield a high profit) in a sequence of transactions (a transaction database) containing utility information
- Algorithms for finding periodic patterns in multiple sequences of events
- the MPFPS_BFS algorithms (Fournier-Viger, P., Li, Z., et al., 2019, powerpoint) for mining periodic patterns that are common to multiple sequences
- the MPFPS_DFS algorithms (Fournier-Viger, P., Li, Z., et al., 2019, powerpoint) for mining periodic patterns that are common to multiple sequences
- the MRCPPS algorithm for mining rare correlated periodic patterns common to multiple sequences (Fournier-Viger et al., 2020)
Graph Pattern Mining
These algorithms discover patterns in graphs
- Algorithms for mining patterns in a database of labelled graphs
- the TKG algorithm for mining the top-k frequent subgraphs in a graph database (Fournier-Viger, 2019, powerpoint)
- the gSpan algorithm for mining the frequent subgraphs in a graph database (Yan et al., 2002)
- Algorithms for mining patterns in a dynamic attributed graph
- the TSeqMiner algorithm (Fournier-Viger et al., 2019)
High-Utility Pattern Mining
These algorithms discover patterns having a high utility (importance) in different kinds of data. For a good overview of high utility itemset mining, you may read this survey paper, and the high utility-pattern mining book.
- algorithms for mining high-utility itemsets in a transaction database having profit information
- the EFIM algorithm (Zida et al. 2016, Zida et al., 2015, powerpoint)
- the FHM algorithm (Fournier-Viger et al., 2014, powerpoint)
- the HUI-Miner algorithm (Liu & Qu, 2012)
- the HUP-Miner algorithm (Krishnamoorthy, 2014)
- the mHUIMiner algorithm (Peng et al., 2017)
- the UFH algorithm (Dawar et al, 2017)
- the HMiner algorithm (Krishnamoorty, 2017)
- the ULB-Miner algorithm (Duong et al, 2018)
- the IHUP algorithm (Ahmed et al., 2009)
- the Two-Phase algorithm (Liu et al., 2005)
- the UP-Growth algorithm (Tseng et al., 2011)
- the UP-Growth+ algorithm (Tseng et al., 2013)
- the UP-Hist algorithm (Dawar et al., 2015)
- the d2HUP algorithm (Liu et al, 2012)
- algorithm for efficiently mining high-utility itemsets with length constraints in a transaction database
- the FHM+ algorithm (Fournier-Viger et al, 2016, powerpoint)
- algorithm for mining correlated high-utility itemsets in a transaction database
- the FCHM_bond algorithm, to use the bond measure (Fournier-Viger et al, 2016, powerpoint, Fournier-Viger 2018 et al., to appear, video )
- the FCHM_allconfidence algorithm, to use the all-confidence measure (Fournier-Viger et al, 2016, powerpoint, Fournier-Viger 2018 et al., to appear)
- algorithm for mining high-utility itemsets in a transaction database containing negative unit profit values
- the FHN algorithm (Fournier-Viger et al., 2014, powerpoint)
- the HUINIV-Mine algorithm (Chu et al., 2009)
- algorithm for mining frequent high-utility itemsets in a transaction database
- the FHMFreq algorithm, a variation of the FHM algorithm (Fournier-Viger et al., 2014)
- algorithm for mining on-shelf high-utility itemsets in a transaction database containing information about time periods of items
- the FOSHU algorithm (Fournier-Viger et al., 2015, powerpoint)
- the TS-HOUN algorithm (Lan et al., 2014)
- algorithm for incremental high-utility itemset mining in a
transaction database
- the EIHI algorithm (Fournier-Viger et al., 2015, powerpoint)
- the HUI-LIST-INS algorithm (Lin et al., 2014)
- algorithm for mining concise representations of high-utility itemsets in a transaction database
- the HUG-Miner algorithm (Fournier-Viger et al., 2014, powerpoint) for mining high-utility generators
- the GHUI-Miner algorithm (Fournier-Viger et al., 2014, powerpoint) for mining generators of high-utility itemsets
- the MinFHM algorithm (Fournier-Viger et al., 2016, powerpoint, video ) for mining minimal high-utility itemsets
- the EFIM-Closed algorithm (Fournier-Viger et al., 2016, powerpoint) for mining closed high-utility itemsets
- the CHUI-Miner algorithm (Wu et al., 2015) for mining closed high-utility itemsets
- the CHUD algorithm for mining closed high-utility itemsets (Tseng et al., 2011/2015)
- the CHUI-Miner(Max) algorithm for mining maximal high utility itemsets (Wu et al., 2019).
- algorithm for mining the skyline high-utility itemsets
in a
transaction database
- the SkyMine algorithm (Goyal et al., 2015)
- algorithm for mining the top-k high-utility itemsets in a
transaction database
- the TKU algorithm (Tseng et al., 2015), obtained from UP-Miner under GPL license
- the TKO-Basic algorithm (Tseng et al., 2015)
- algorithms for mining the top-k high utility itemsets from a data stream with a window
- the FHMDS and FHMDS-Naive algorithms (Dawar et al. 2017)
- algorithm for mining frequent skyline utility patterns
in a
transaction database
- the SFUPMinerUemax algorithms (Lin et al, 2016)
- algorithm for mining quantitative high utility itemsets in a transaction database:
- the VHUQI algorithm (Wu et al., 2014)
- algorithm for mining high-utility sequential rules in a sequence database
- the HUSRM algorithm (Zida et al., 2015)
- algorithm for mining high-utility sequential patterns in a sequence database
- the USPAN algorithm (Yin et al. 2012)
- algorithm for mining cost-efficient sequential patterns (a.k.a. low-cost high utility sequential patterns)
- the CorCEPB algorithm for mining cost-efficient patterns in sequences with binary utility information and cost values (Fournier-Viger et al., 2020 , ppt, video )
- the CEPB algorithm for mining cost-efficient patterns in sequences with binary utility information and cost values - consider only sequence with positive utility(Fournier-Viger et al., 2020 , ppt)
- the CEPN algorithm for mining cost-efficient patterns in sequences with numeric utility information and cost values (Fournier-Viger et al., 2020, ppt, video )
- algorithm for mining high-utility probability sequential patterns in a sequence database
- the PHUSPM algorithm (Zhang et al. 2018)
- the UHUSPM algorithm (Zhang et al. 2018)
- algorithm for mining high-utility itemsets in a
transaction database using evolutionary algorithms
- the HUIM-GA algorithm (Kannimuthu et al., 2014)
- the HUIM-BPSO algorithm (Lin et al, 2016)
- the HUIM-GA-tree algorithm (Lin et al, 2016)
- the HUIM-BPSO-tree algorithm (Lin et al, 2016)
- the HUIF-PSO algorithm (Song et al., 2018)
- the HUIF-GA algorithm (Song et al., 2018)
- the HUIF-BA algorithm (Song et al., 2018)
- the HUIM-ABC algorithm (Song et al., 2018)
- algorithm for mining high average-utility itemsets in a
transaction database
- the HAUI-Miner algorithm for mining high average-utility itemsets (Lin et al, 2016)
- the EHAUPM algorithm for mining high average-utility itemsets (Lin et al, 2017)
- the HAUI-MMAU algorithm for mining high average-utility itemsets with multiple thresholds (Lin et al, 2016)
- the MEMU algorithm for mining high average-utility itemsets with multiple thresholds (Lin et al, 2018)
- algorithms for mining high utility episodes in a sequence of complex events (a transaction database)
- the HUE-SPAN algorithm (Fournier-Viger et al., 2019, powerpoint) for mining high utility episodes in a sequence of complex events (a transaction database) with utility information
- the TUP algorithm (Rathore et al., 2016) for mining frequent periodic patterns in a sequence of transactions (a transaction database))
- the UP-SPAN algorithm (Wu et al., 2013 ) for mining periodic high-utility patterns (periodic patterns that yield a high profit) in a sequence of transactions (a transaction database) containing utility information
- algorithms
for mining periodic high-utility patterns (periodic patterns that yield a high profit) in a sequence of transactions (a transaction database) containing utility information
- the PHM algorithm (Fournier-Viger et al, 2016b, powerpoint)
- algorithms for discovering irregular high utility itemsets (non periodic patterns) in a transaction database with utility information
- the PHM_irregular algorithm, which is a simple variation of the PHM algorithm
- algorithm for discovering local high utility itemsets in
a database with utility information and timestamps
- the LHUI-Miner algorithm (Fournier-Viger et al., 2019, powerpoint)
- algorithm for discovering peak high utility itemsets in
a database with utility information and timestamps
- the PHUI-Miner algorithm (Fournier-Viger et al., 2019, powerpoint)
Association Rule Mining
These algorithms discover interesting associations between symbols (values) in a transaction database (database records with binary attributes).
- an algorithm for mining all association rules in a transaction database (Agrawal & Srikant, 1994)
- an algorithm for mining all association rules with the lift measure in a transaction database (adapted from Agrawal & Srikant, 1994)
- an algorithm for mining the IGB informative and generic basis of association rules in a transaction database (Gasmi et al., 2005)
- an algorithm for mining perfectly sporadic association rules (Koh & Roundtree, 2005)
- an algorithm for mining closed association rules (Szathmary et al. 2006).
- an algorithm for mining minimal non redundant association rules (Kryszkiewicz, 1998)
- the Indirect algorithm for mining indirect association rules (Tan et al. 2000; Tan et 2006)
- the FHSAR algorithm for hiding sensitive association rules (Weng et al. 2008)
- the TopKRules algorithm for mining the top-k association rules (Fournier-Viger, 2012b, powerpoint)
- the TopKClassRules algorithm for mining the top-k class association rules (a variation of TopKRules. This latter is described in Fournier-Viger, 2012b, powerpoint)
- the TNR algorithm for mining top-k non-redundant
association rules (Fournier-Viger
2012d, powerpoint)
Stream mining
These algorithms discovers various kinds of patterns in a stream (an infinite sequence of database records (transactions))
- the estDec algorithm for mining recent frequent itemsets in a data stream (Chang & Lee, 2003)
- the estDec+ algorithm for mining recent frequent itemsets in a data stream (Shin et al., 2014)
- the CloStream algorithm for mining frequent closed itemsets in a data stream (Yen et al, 2009)
- algorithms for mining the top-k high utility itemsets from a data stream with a window
- the FHMDS and FHMDS-Naive algorithms (Dawar et al. 2017)
Clustering
These algorithms automatically find clusters in different kinds of data
- the original K-Means algorithm (MacQueen, 1967)
- the Bisecting K-Means algorithm (Steinbach et al, 2000)
- algorithms for density-based clustering
- the DBScan algorithm (Ester et al., 1996)
- the Optics algorithm to extract a cluster ordering of points, which can then be use to generate DBScan style clusters and more (Ankerst et al, 1999)
- a hierarchical clustering algorithm
- a tool called Cluster Viewer for visualizing clusters
- a tool called Instance Viewer for visualizing the input of clustering algorithms
Time series mining
These algorithms perform various tasks to analyze time series data
- an algorithm for converting a time series to a sequence of symbols using the SAX representation of time series. Note that if one converts a set of time series with SAX, he will obtain a sequence database, which allows to then apply traditional algorihtms for sequential rule mining and sequential pattern mining on time series (SAX, 2007).
- algorithms for calculating the prior moving average of a time series (to remove noise)
- algorithms for calculating the cumulative moving average f a time series (to remove noise)
- algorithms for calculating the central moving average of a time series (to remove noise)
- an algorithm for calculating the median smoothing of a time series (to remove noise)
- an algorithm for calculating the exponential smoothing of a time series (to remove noise)
- an algorithm for calculating the min max normalization of a time series
- an algorithm for calculating the autocorrelation function of a time series
- an algorithm for calculating the standardization of a time series
- an algorithm for calculating the first and second order differencing of a time series
- an algorithm for calculating the piecewise aggregate approximation of a time series (to reduce the number of data points of a time series)
- an algorithm for calculating the linear regression of a time series (using the least squares method)
- an algorithm for splitting a time series into segments of a given length
- an algorithm for splitting a time series into a given number of segments
- algorithms to cluster time series (group time-series according to their similarities). This can be done by applying the clustering algorithms offered in SPMF (K-Means, Bisecting K-Means, DBScan, OPTICS, Hierarchical clustering) on time series.
- a tool called Time Series Viewer for visualizing time series
Classification
- the ID3 algorithm for building decision trees (Quinlan, 1986)
Text mining
- an algorithm for classifying text documents using a Naive Bayes classifier approach (S. Raghu, 2015)
- an algorithm for clustering texts using the tf*idf measure (S. Raghu, 2015)
Data structures
- red-black tree,
- itemset-tree,
- binary tree,
- KD-tree,
- triangular matrix.
Tools
- A tool for generating a synthetic transaction database
- A tool for generating a synthetic sequence database
- A tool for generating a synthetic sequence database with timestamps
- A tool for calculating statistics about a transaction database
- A tool for calculating statistics about a transaction database with utility information
- A tool for calculating statistics about a sequence database
- A tool for converting a sequence database to a transaction database
- A tool for converting a transaction database to a sequence database
- A tool for converting a text file to a sequence database (each sentences becomes a sequence)
- A tool for converting a sequence database in various formats (CSV, KOSARAK, BMS, IBM...) to a sequence database in SPMF format
- A tool for converting a transaction database in various formats (CSV...) to a transaction database in SPMF format
- A tool for converting time-series to a sequence database
- A tool to generate utility values for a transaction database
- A tool to add timestamps to a sequence database
- A tool for removing utility information from a database having utility information
- A tool to resize a database in SPMF format (a text file) using a percentage of lines of data from an original database.
- A tool for visualizing time-series
Visual map of algorithms
You can visualize the relationship between the various data mining algorithms offered in SPMF by clicking on this map (last updated : 2015/09/12 - SPMF 0.97):