SPMF: A Java Open-Source Data Mining Library

Algorithms

SPMF offers implementations of the following data mining algorithms.

Sequential Pattern Mining

These algorithms discover sequential patterns in a set of sequences. For a good overview of sequential pattern mining algorithms, please read this survey paper.

algorithms for mining sequential patterns (subsequences that appear in many sequences) of a sequence database
- the CM-SPADE algorithm (Fournier-Viger et al, 2014, powerpoint)
- the CM-SPAM algorithm (Fournier-Viger et al, 2014, powerpoint)
- the FAST algorithm (Salvemini et al, 2011)
- the GSP algorithm (Srikant et al., 1996)
- the LAPIN (aka LAPIN-SPAM) algorithm (Yang et al., 2005)
- the PrefixSpan algorithm (Pei et al., 2004, powerpoint, video)
- the SPADE algorithm (Zaki et al., 2001)
- the SPAM algorithm (Ayres et al., 2002)
algorithms for mining closed sequential patterns in a sequence database
- the ClaSP algorithm (Gomariz et al., 2013)
- the CM-ClaSP algorithm (Fournier-Viger et al, 2014, powerpoint)
- the CloFAST algorithm (Fumarola et al, 2016)
- the CloSpan algorithm (Yan et al., 2003)
- the BIDE+ algorithm(Wang et al., 2007)
algorithms for mining maximal sequential patterns in a sequence database
- the VMSP algorithm (Fournier-Viger et al, 2014, powerpoint)
- the MaxSP algorithm (Fournier-Viger et al., 2013, powerpoint).
algorithms for mining the top-k sequential patterns in a sequence database
- the TKS algorithm (Fournier-Viger et al., 2013, powerpoint).
- the TSP algorithm (Tzvetkoz et al., 2003).
- the Skopus algorithm for mining the top-k sequential patterns using leverage and significance (Petijean et al., 2016)
algorithms for mining sequential generator patterns in a sequence database
- the VGEN algorithm (Fournier-Viger et al, 2014)
- the FEAT algorithm (Gao et al., 2008).
- the FSGP algorithm (Yi et al., 2011).
algorithms for mining nonoverlapping sequential patterns in one or many sequences of symbols/characters (can count multiple occurrences of a pattern in each sequence)
- the NOSEP algorithm (Wu et al., 2018)
algorithms for mining compressing sequential patterns
- the GoKrimp and SeqKrimp algorithms (Lam et al., 2012; Lam et al., 2014)
algorithm for identifying the top-k quantile based cohesive sequential patterns in a single sequence or in multiple sequences
- the QCSP algorithm (Feremans et al., 2019)
algorithms for mining multidimensional sequential patterns in a multidimensional sequence database
- the SeqDIM algorithm for mining frequent multidimensional sequential patterns in a multi-dimensional sequence database (Pinto et al., 2001)
- the Songram et al. algorithm for mining frequent closed multidimensional sequential patterns in a multi-dimensional sequence database (Songram et al. 2006)
the Fournier-Viger et al. algorithm, a sequential pattern mining algorithm that combines several features from well-known sequential pattern mining algorithms and also proposes some original features (Fournier-Viger et al., 2008):
- mining sequences with minimum support by database-projection (based on PrefixSpan, Pei et al., 2004)
- mining sequences with min/max time interval between events and min/max time length of a sequence (based on Hirate-Yamana, 2006)
- mining closed sequences (based on the BIDE+ algorithm by Wang et al. 2007)
- mining multi-dimensional sequences (based on Pinto et al. 2001)
- mining closed multi-dimensional sequences (based on Songram et al. 2006 and Pasquier et al., 1999)
- mining sequences with items having integer values and performing automatic clustering of these values (original extension described in Fournier-Viger et al., 2008)
algorithm for mining high-utility sequential patterns in a sequence database
- the USPAN algorithm (Yin et al. 2012)
algorithm for mining cost-efficient sequential patterns (a.k.a. low-cost high utility sequential patterns)
- the CorCEPB algorithm for mining cost-efficient patterns in sequences with binary utility information and cost values (Fournier-Viger et al., 2020, ppt , video )
- the CEPB algorithm for mining cost-efficient patterns in sequences with binary utility information and cost values - consider only sequence with positive utility(Fournier-Viger et al., 2020 , ppt, video )
- the CEPN algorithm for mining cost-efficient patterns in sequences with numeric utility information and cost values (Fournier-Viger et al., 2020, ppt, video )
algorithm for mining high-utility probability sequential patterns in a sequence database
- the PHUSPM algorithm (Zhang et al. 2018)
- the UHUSPM algorithm (Zhang et al. 2018)
algorithm for progressive sequential pattern mining with convergence guarantees
- the ProSecCo algorithm (Servan-Schreiber et al. 2018)
algorithms for mining sequential patterns with flexible constraints in a time-extended sequence database (eg. MOOC data)
- the SPM-FC-L algorithm (Song et al., 2022)
- the SPM-FC-P algorithm (Song et al., 2022)
the Occur algorithm for finding all occurrences of some sequential patterns in sequences by post-processing.
algorithms for mining patterns in sequences of events described by time intervals (i.e. Time Interval Related Pattern (TIRP) mining)
- the FastTIRP algorithm (Fournier-Viger et al., 2022)
- the VertTIRP algorithm (Mordvanyuk et al., 2021)

Sequential Rule Mining

These algorithms discover sequential rules in a set of sequences.

algorithms for mining sequential rules in a sequence database
- the ERMiner algorithm (Fournier-Viger et al., 2014)
- the RuleGrowth algorithm (Fournier-Viger et al., 2011, Fournier-Viger et al., 2015, powerpoint, video)
- the CMRules algorithm (Fournier-Viger et al., 2010, powerpoint)
- the CMDeo algorithm (Fournier-Viger et al., 2010)
- the RuleGen algorithm (Zaki et al, 2001)
algorithms for mining sequential rules in a sequence database with the window size constraint
- the TRuleGrowth algorithm (Fournier-Viger, 2012a, Fournier-Viger et al., 2015)
algorithms for mining top-k sequential rules in a sequence database
- the TopSeqRules algorithm for mining the top-k sequential rules (Fournier-Viger et al., 2011, powerpoint)
- the TopSeqClassRules algorithm for mining the top-k class sequential rules (a variation of Fournier-Viger et al., 2011)
- the TNS algorithm for mining the top-k non-redundant sequential rules (Fournier-Viger 2013)
algorithm for mining high-utility sequential rules in a sequence database
- the HUSRM algorithm (Zida et al., 2015)

Sequence Prediction

These algorithms for predicting the next symbol of a sequence based on a set of training sequences

the Compact Prediction Tree+ (CPT+) algorithm (Gueniche et al., 2015, powerpoint, )
the Compact Prediction Tree (CPT) algorithm (Gueniche et al., 2013, )
the First order Markov Chains (PPM - order 1) (Clearly et al, 1984)
the Dependency Graph (DG) (Padmanabhan, 1996)
the All-k-Order Markov Chains (AKOM) (Pitkow, 1999)
the TDAG (Laird & Saul, 1994)
the LZ78 (Ziv, 1978)

Itemset Mining

These algorithms discover interesting itemsets (sets of values) that appear in a transaction database (database records containing symbolic data). For a good overview of itemset mining, please read this survey paper.

algorithms for discovering frequent itemsets in a transaction database.
- the Apriori algorithm (Agrawal & Srikant, 1994, video
- the AprioriTID algorithm (Agrawal & Srikant, 1994)
- the FP-Growth algorithm (Han et al., 2004)
- the Eclat algorithm (Zaki, 2000)
- the dEclat algorithm (Zaki and Gouda, 2001, 2003)
- the Relim algorithm (Borgelt, 2005)
- the H-Mine algorithm (Pei et al., 2007)
- the LCMFreq algorithm (Uno et al., 2004)
- the PrePost and PrePost+ algorithms (Deng et al., 2012, Deng et Lv, 2015)
- the FIN algorithm (Deng et al., 2014)
- the DFIN algorithm (Deng et al., 2016)
- the NegFIN algoritm (Aryabarzan et al., 2018)
algorithms for discovering frequent closed itemsets in a transaction database.
- the FPClose algorithm (Grahne and Zhu, 2005)
- the Charm algorithm (Zaki and Hsiao, 2002)
- the dCharm algorithm (Zaki and Gouda, 2001)
- the DCI_Closed algorithm (Lucchese et al, 2004)
- the LCM algorithm (Uno et al., 2004)
- the AprioriClose aka Close algorithm (Pasquier et al., 1999)
- the AprioriTID Close algorithm (Pasquier et al., 1999, Agrawal & Srikant, 1994)
- the NAFCP algorithm (Le et al., 2015)
- the NEclatClosed algorithm (Aryabarzan et al., 2021)
algorithms for recovering all frequent itemsets from frequent closed itemsets:
- the LevelWise algorithm (Pasquier et al., 1999)
- the DFI-Growth algorithm (Huang et al., 2019)
- the DFI-List algorithm (Wu et al., 2020)
algorithms for discovering frequent maximal itemsets in a transaction database.
- the FPMax algorithm (Grahne and Zhu, 2003)
- the Charm-MFI algorithm for discovering frequent closed itemsets and maximal frequent itemsets by post-processing in a transaction database (Szathmary et al. 2006)
algorithms for mining frequent itemsets with multiple minimum supports
- the MSApriori algorithm (Liu et al, 1999)
- the CFPGrowth++ algorithm (Uday & Reddy, 2011, Hu & Chen, 2006)
algorithms for mining generator itemsets in a transaction database
- the DefMe algorithm for mining frequent generator itemsets in a transaction database (Soulet & Rioult, 2014)
- the Pascal algorithm for mining frequent itemsets, and identifying at the same time which one are generators (Bastide et al., 2002)
- the Zart algorithm for discovering frequent closed itemsets and their generators in a transaction database (Szathmary et al. 2007)
algorithms for mining rare itemsets and/or correlated itemsets in a transaction database
- the AprioriInverse algorithm for mining perfectly rare itemsets (Koh & Roundtree, 2005)
- the AprioriRare algorithm for mining minimal rare itemsets and frequent itemsets (Szathmary et al. 2007b)
- the CORI algorithm for mining minimal rare correlated itemsets using the support and bond measures (Bouasker et al. 2015)
- the RP-Growth algorithm for mining rare itemsets (Tsang et al., 2011)
algorithms for performing targeted and dynamic queries about association rules and frequent itemsets.
- the Itemset-Tree, a data structure that can be updated incrementally, and algorithms for querying it. (Kubat et al, 2003)
- the Memory-Efficient Itemset-Tree, a data structure that can be updated incrementally, and algorithms for querying it. (Fournier-Viger, 2013, powerpoint)
algorithms to discover frequent itemsets in a stream
- the estDec algorithm for mining recent frequent itemsets in a data stream (Chang & Lee, 2003)
- the estDec+ algorithm for mining recent frequent itemsets in a data stream (Shin et al., 2014)
- the CloStream algorithm for mining frequent closed itemsets in a data stream (Yen et al, 2009)
the U-Apriori algorithm for mining frequent itemsets in uncertain data (Chui et al, 2007)
the VME algorithm for mining erasable itemsets (Deng & Xu, 2010)
algorithms to discover fuzzy frequent itemsets in a quantitative transaction database
- the FFI-Miner algorithm for mining fuzzy itemsets (Lin et al., 2015)
- the MFFI-Miner algorithm for mining multiple fuzzy itemsets (Lin et al., 2016)
the OPUS-Miner algorithm for mining self-sufficient itemsets (Webb et al., 2014)
algorithms to discover compressing itemsets
- the KRIMP algorithm (Vreeken et al, 2011)
- the SLIM algorithm (Smets et al., 2012)
algorithms to discover the top-k most frequent itemsets
- the Apriori(top-k) algorithm, which is a modified version of Apriori
- the FPGrowth(top-k) algorithm, which is a modified version of FP-Growth

Episode Mining

These algorithms discover patterns (episodes) that appear in a single sequence of events. For a good overview of episode mining, please read this survey paper.

algorithms for mining frequent episodes
- the EMMA algorithm, which finds the frequent episodes, and counts the support based on the head frequency (Kuo-Yu et al., 2008)
- the AFEM algorithm, which finds the frequent episodes, and counts the support based on the head frequency (Fournier-Viger et al., 2022 )
- the MINEPI+ algorithm, which finds the frequent episodes, and counts the support based on the head frequency (Kuo-Yu et al., 2008)
- the MINEPI algorithm, which finds the frequent episodes, counts the support based on minimal occurrences, and does not allow simultaneous events (Mannila & Toivonen, 1997)
- the TKE algorithm, which finds the top-k most frequent episodes based on the head frequency (Fournier-Viger et al., 2020)
- the MaxFEM algorithm, which finds the maximal frequent episodes, and counts the support based on the head frequency (Fournier-Viger et al., 2022 , powerpoint)
algorithms for mining episode rules
- the POERM algorithm for discovering partially-ordered episode rules in a sequence of events, using non-overlapping support (Fournier-Viger et al., 2021, , powerpoint)
- the POERM-ALL algorithm for discovering partially-ordered episode rules in a sequence of events, using non-overlapping support (Fournier-Viger et al., 2021, , powerpoint)
- the POERMH algorithm for discovering partially-ordered episode rules in a sequence of events, using the head support (Fournier-Viger et al., 2021, , powerpoint)
- the NONEPI algorithm for discovering episodes rules using the non-overlapping frequency (Ouarem et al., 2021)
- algorithms to generate episodes rules (Mannila & Toivonen, 1997) using the output of TKE, AFEM, EMMA or MINEPI+
algorithms for mining high utility episodes in a sequence of complex events (a transaction database) with utility information
- the HUE-SPAN algorithm (Fournier-Viger et al., 2019, powerpoint) for mining high utility episodes in a sequence of complex events (a transaction database) with utility information
- the US-SPAN algorithm (Wu et al., 2013 ) for mining high utility episodes in a sequence of complex events (a transaction database) with utility information

the TUP algorithm (Rathore et al., 2016) for mining the top-k high utility episodes in a sequence of complex events (a transaction database) with utility information

algorithms for mining nonoverlapping sequential patterns in one or many sequences of symbols
- the NOSEP algorithm (Wu et al., 2018)
algorithms for mining frequent sequential patterns with periodic wilcard gaps in a sequence of characters
- the MAPD algorithm (Wu, Y. et al., 2014)
algorithms for mining self-adaptive one-off weak-gap strong sequential patterns in a sequence of characters

the OWSP-Miner algorithm (Wu, Y. et al., 2022)

Periodic Pattern Mining

These algorithms discover patterns that periodically appear in a sequence of records (e.g. transactions)

Algorithms for finding periodic patterns in a single sequence of events (also called a transaction database)
- the PFPM algorithm (Fournier-Viger et al, 2016a, powerpoint, video ) for mining frequent periodic patterns in a sequence of transactions (a transaction database))
Algorithms for mining stable periodic itemsets in a sequence of events (also called a transaction database) with or without timestamps
- the SPP-Growth algorithm for mining stable periodic patterns(Fournier-Viger et al. 2019, powerpoint, video )
- the TSPIN algorithm to discover the Top-k Stable Periodic frequent itemsets (Fournier-Viger et al.,2021 )
Algorithms for mining locally periodic patterns in a transaction database with or without timestamps.
- the LPP-Growth algorithm (Fournier-Viger. 2020, powerpoint)
- the LPPM_breadth algorithm (Fournier-Viger. 2020, powerpoint)
- the LPPM_depth algorithm (Fournier-Viger. 2020, powerpoint)
Algorithms for discovering periodic patterns that are significant or non-redundant
- the NPFPM algorithm to discover non-redundant periodic frequent itemsets in a transaction database (Afriyie et al., 2020, 2021)
- the PPFP algorithm to discover productive periodic frequent itemsets in a transaction database (Nofong, V. M., 2016)
- the SRPFPM algorithm to discover self-reliant periodic frequent patterns in a transaction database (Nofong et al, 2021)
Algorithms for mining periodic high utility itemsets in a sequenceof transactions (a transaction database) containing utility information
- the PHM algorithm (Fournier-Viger et al, 2016b, powerpoint )
- the PHMN algorithm (2023) for mining periodic high utility itemsets with positive or negative utility
- the PHMN+ algorithm (2023) for mining periodic high utility itemsets with positive or negative utility
- the PHM_irregular algorithm for mining irregular high utility itemsets (the opposite of periodic), which is a simple variation of the PHM algorithm
Algorithms for finding periodic patterns in multiple sequences of events
- the MPFPS_BFS algorithms (Fournier-Viger, P., Li, Z., et al., 2019, powerpoint) for mining periodic patterns that are common to multiple sequences
- the MPFPS_DFS algorithms (Fournier-Viger, P., Li, Z., et al., 2019, powerpoint) for mining periodic patterns that are common to multiple sequences
Algorithms for mining rare correlated periodic patterns common to multiple sequences

the MRCPPS algorithm (Fournier-Viger et al., 2020)

Graph Pattern Mining

These algorithms discover patterns in graphs

Algorithms for mining frequent subgraphs
- the TKG algorithm for mining the top-k frequent subgraphs in a graph database (Fournier-Viger, 2019, powerpoint)
- the gSpan algorithm for mining all the frequent subgraphs in a graph database (Yan et al., 2002)
- the cgSpan algorithm for mining all the frequent closed subgraphs in a graph database or single graph (using the traditional support or MNI support) (Shaul et al. 2021)
Algorithms for mining patterns in a dynamic attributed graph
- the TSeqMiner algorithm (Fournier-Viger et al., 2019)
- the AER-Miner algorithm (Fournier-Viger et al., 2020, PPT)

High-Utility Pattern Mining

These algorithms discover patterns having a high utility (importance) in different kinds of data. For a good overview of high utility itemset mining, you may read this survey paper, and the high utility-pattern mining book.

algorithms for mining high-utility itemsets in a transaction database having profit information
- the EFIM algorithm (Zida et al. 2016, Zida et al., 2015, powerpoint)
- the FHM algorithm (Fournier-Viger et al., 2014, powerpoint , video)
- the HUI-Miner algorithm (Liu & Qu, 2012, video)
- the HUP-Miner algorithm (Krishnamoorthy, 2014)
- the mHUIMiner algorithm (Peng et al., 2017)
- the UFH algorithm (Dawar et al, 2017)
- the HMiner algorithm (Krishnamoorty, 2017)
- the ULB-Miner algorithm (Duong et al, 2018)
- the IHUP algorithm (Ahmed et al., 2009)
- the Two-Phase algorithm (Liu et al., 2005)
- the UP-Growth algorithm (Tseng et al., 2011)
- the UP-Growth+ algorithm (Tseng et al., 2013)
- the UP-Hist algorithm (Dawar et al., 2015)
- the d2HUP algorithm (Liu et al, 2012)
- the FHIM algorithm (Sahoo et al., 2015)
algorithm for efficiently mining high-utility itemsets with length constraints in a transaction database
- the FHM+ algorithm (Fournier-Viger et al, 2016, powerpoint)
algorithm for mining correlated high-utility itemsets in a transaction database
- the FCHM_bond algorithm, to use the bond measure (Fournier-Viger et al, 2016, powerpoint, Fournier-Viger 2018 et al., to appear, video )
- the FCHM_allconfidence algorithm, to use the all-confidence measure (Fournier-Viger et al, 2016, powerpoint, Fournier-Viger 2018 et al., to appear)
- the ECHUM algorithm (Ramesth et al., 2022, @aman955 under the GPL license)
algorithm for mining high-utility itemsets in a transaction database containing negative unit profit values
- the FHN algorithm (Fournier-Viger et al., 2014, powerpoint)
- the HUINIV-Mine algorithm (Chu et al., 2009)
algorithms for mining multi-level or cross-level high utility itemsets in a transaction database with a taxonomy:
- CLH-Miner for discovering cross-level high-utility itemsets (Fournier-Viger et al., 2020, ppt)
- FEACP for discovering cross-level high-utility itemsets (Tung et al., 2022)
- TKC for discovering the top-k cross-level high-utility itemsets (Nouioua et al., 2020, video, ppt) -- will be in a next release of SPMF...
- MLHUI-Miner for discovering the multi-level high utility itemsets (Cagliero et al., 2017)
algorithm for mining low-cost high utility itemsets in a transaction database with cost and utility information
- the LCIM algorithm (Fournier-Viger et al, 2022)
algorithm for mining frequent high-utility itemsets in a transaction database
- the FHMFreq algorithm, a variation of the FHM algorithm (Fournier-Viger et al., 2014
algorithm for mining on-shelf high-utility itemsets in a transaction database containing information about time periods of items
- the FOSHU algorithm (Fournier-Viger et al., 2015, powerpoint)
- the TS-HOUN algorithm (Lan et al., 2014)
algorithm for incremental high-utility itemset mining in a transaction database
- the EIHI algorithm (Fournier-Viger et al., 2015, powerpoint)
- the HUI-LIST-INS algorithm (Lin et al., 2014)
algorithm for incremental closed high-utility itemset mining in a transaction database

the IncCHUI algorithm for incrementally discovering the closed high utility itemsets (Dam et al., 2018, code obtained from github based on GPL license)

algorithm for mining concise representations of high-utility itemsets in a transaction database
- the HUG-Miner algorithm (Fournier-Viger et al., 2014, powerpoint) for mining high-utility generators
- the GHUI-Miner algorithm (Fournier-Viger et al., 2014, powerpoint) for mining generators of high-utility itemsets
- the MinFHM algorithm (Fournier-Viger et al., 2016, powerpoint, video ) for mining minimal high-utility itemsets
- the EFIM-Closed algorithm (Fournier-Viger et al., 2016, powerpoint) for mining closed high-utility itemsets
- the CHUI-Miner algorithm (Wu et al., 2015) for mining closed high-utility itemsets
- the CLS-Miner algorithm (Dam et al., 2019 ) for mining closed high-utility itemsets
- the HMiner_Closed algorithm (Nguyen et al., 2019) for mining closed high-utility itemsets
- the CHUD algorithm for mining closed high-utility itemsets (Tseng et al., 2011/2015)
- the CHUI-Miner(Max) algorithm for mining maximal high utility itemsets (Wu et al., 2019).
- the HUCI_Miner algorithm for simultaneously mining closed high utility itemsets and high utility generators (Sahoo et al., 2015)
algorithm for mining the skyline high-utility itemsets in a transaction database
- the SkyMine algorithm (Goyal et al., 2015)
- the SFUI_UF algorithm for mining skyline frequent high utility itemsets (Song et al., 2021 )
- the SFU_CE algorithm for mining skyline frequent high utility itemsets (Song et al., 2021 , ppt)
- the SFUPMinerUemax algorithm for mining skyline frequent high utility itemsets (Lin et al, 2016)
- the EMSFUI_D algorithm for mining skyline frequent high utility itemsets (Liu et al., 2022)
- the EMSFUI_B algorithm for mining skyline frequent high utility itemsets (Liu et al., 2022)
algorithm for mining the top-k high-utility itemsets in a transaction database
- the TKU algorithm (Tseng et al., 2015), obtained from UP-Miner under GPL license
- the TKO-Basic algorithm (Tseng et al., 2015)
- the THUI algorithm (Krishnamoorty, 2019)
algorithms for mining the top-k high utility itemsets from a data stream with a window
- the FHMDS and FHMDS-Naive algorithms (Dawar et al. 2017)
algorithm for mining quantitative high utility itemsets in a transaction database:
- the FHUQI-Miner algorithm (Nouioua et al., 2021, powerpoint)
- the VHUQI algorithm (Wu et al., 2014)
- the TKQ algorithm for mining the quantitative high utility itemsets (Nouioua et al. 2021, powerpoint, video)
- the CHUQI-Miner algorithm for mining the correlated quantitative high utility itemsets (Nouioua et al. 2021, powerpoint, video)
algorithm for mining high-utility sequential rules in a sequence database
- the HUSRM algorithm (Zida et al., 2015)
algorithm for mining high-utility sequential patterns in a sequence database
- the USPAN algorithm (Yin et al. 2012)
algorithm for mining cost-efficient sequential patterns (a.k.a. low-cost high utility sequential patterns)
- the CorCEPB algorithm for mining cost-efficient patterns in sequences with binary utility information and cost values (Fournier-Viger et al., 2020 , ppt, video )
- the CEPB algorithm for mining cost-efficient patterns in sequences with binary utility information and cost values - consider only sequence with positive utility(Fournier-Viger et al., 2020 , ppt)
- the CEPN algorithm for mining cost-efficient patterns in sequences with numeric utility information and cost values (Fournier-Viger et al., 2020, ppt, video )
algorithm for mining high-utility probability sequential patterns in a sequence database
- the PHUSPM algorithm (Zhang et al. 2018)
- the UHUSPM algorithm (Zhang et al. 2018)
algorithm for heuristically mining the top-k high-utility itemsets in a transaction database

the TKU-CE algorithm (Song et al. 2021)
the TKU-CE+ algorithm (Song et al., 2021)

algorithm for mining high-utility itemsets in a transaction database using evolutionary algorithms, swarm intelligence techniques or other meta-heuristics
- the HUIM-AF algorithm (Song et al., 2021)
- the HUIM-HC algorithm (Fournier-Viger et al., 2021)
- the HUIM-SA algorithm (Fournier-Viger et al., 2021)
- the HUIM-ACO algorithm (Song et al., 2020)
- the HUIM-SPSO algorithm (Song et al., 2020)
- the HUIF-PSO algorithm (Song et al., 2018)
- the HUIF-GA algorithm (Song et al., 2018)
- the HUIF-BA algorithm (Song et al., 2018)
- the HUIM-ABC algorithm (Song et al., 2018)
- the HUIM-GA algorithm (Kannimuthu et al., 2014)
- the HUIM-BPSO algorithm (Lin et al, 2016)
- the HUIM-GA-tree algorithm (Lin et al, 2016)
- the HUIM-BPSO-tree algorithm (Lin et al, 2016)
algorithm for mining high average-utility itemsets in a transaction database
- the HAUI-Miner algorithm for mining high average-utility itemsets (Lin et al, 2016)
- the EHAUPM algorithm for mining high average-utility itemsets (Lin et al, 2017)
- the HAUIM-GMU algorithm for mining high average-utility itemsets (Song et al., 2021 )
- the HAUI-MMAU algorithm for mining high average-utility itemsets with multiple thresholds (Lin et al, 2016)
- the MEMU algorithm for mining high average-utility itemsets with multiple thresholds (Lin et al, 2018)
algorithms for mining the top-k high average utility itemsets
- the ETAUIM algorithm (2023) for mining the top-k high average utility itemset using a breadth-first search (obtained from github liuxuan615 under the GPL license due to containing GPL code)
algorithms for mining high utility episodes in a sequence of complex events (a transaction database)
- the HUE-SPAN algorithm (Fournier-Viger et al., 2019, powerpoint) for mining high utility episodes in a sequence of complex events (a transaction database) with utility information
- the TUP algorithm (Rathore et al., 2016) for mining top-k high utility episodes in a sequence of transactions (a transaction database))
- the UP-SPAN algorithm (Wu et al., 2013 ) for mining high-utility episodes(patterns that yield a high profit) in a sequence of transactions (a transaction database) containing utility information
algorithms for mining periodic high-utility patterns (periodic patterns that yield a high profit) in a sequence of transactions (a transaction database) containing utility information
- the PHM algorithm (Fournier-Viger et al, 2016b, powerpoint)
- the PHMN and PHMN+ algorithms (2023) for mining periodic high utility itemsets with positive or negative utility ( obtained from Github @laughing1999 under the GPL license)
algorithms for discovering irregular high utility itemsets (non periodic patterns) in a transaction database with utility information
- the PHM_irregular algorithm, which is a simple variation of the PHM algorithm
algorithm for discovering local high utility itemsets in a database with utility information and timestamps
- the LHUI-Miner algorithm (Fournier-Viger et al., 2019, powerpoint)
algorithm for discovering peak high utility itemsets in a database with utility information and timestamps
- the PHUI-Miner algorithm (Fournier-Viger et al., 2019, powerpoint)
algorithm for discovering locally trending high utility itemsets in a database with utility information and timestamps
- the LTHUI-Miner algorithm (Fournier-Viger et al., 2020, video ppt
algorithm for discovering high utility association rules
- the HGB_all algorithm to derive all high utility association rules or just the non redundant high utility association rules (Sahoo et al., 2015)
- the HGB algorithm to derive all non redundant high utility association rules (Sahoo et al., 2015)

Association Rule Mining

These algorithms discover interesting associations between symbols (values) in a transaction database (database records with binary attributes).

an algorithm for mining all association rules with the confidence measure in a transaction database (Agrawal & Srikant, 1994)
an algorithm for mining all association rules with the lift measure in a transaction database (adapted from Agrawal & Srikant, 1994)
an algorithm for mining the IGB informative and generic basis of association rules in a transaction database (Gasmi et al., 2005)
an algorithm for mining perfectly sporadic association rules (Koh & Roundtree, 2005)
an algorithm for mining closed association rules (Szathmary et al. 2006).
an algorithm for mining minimal non redundant association rules (Kryszkiewicz, 1998)
the Indirect algorithm for mining indirect association rules (Tan et al. 2000; Tan et 2006)
the FHSAR algorithm for hiding sensitive association rules (Weng et al. 2008)
the TopKRules algorithm for mining the top-k association rules (Fournier-Viger, 2012b, powerpoint)
the TopKClassRules algorithm for mining the top-k class association rules (a variation of TopKRules. This latter is described in Fournier-Viger, 2012b, powerpoint)
the TNR algorithm for mining top-k non-redundant association rules (Fournier-Viger 2012d, powerpoint)
the HGB and HGB_All algorithm to find high utility association rules or non redundant high utility association rules (Sahoo et al., 2015)

Stream pattern mining

These algorithms discovers various kinds of patterns in a stream (an infinite sequence of database records (transactions))

the estDec algorithm for mining recent frequent itemsets in a data stream (Chang & Lee, 2003)
the estDec+ algorithm for mining recent frequent itemsets in a data stream (Shin et al., 2014)
the CloStream algorithm for mining frequent closed itemsets in a data stream (Yen et al, 2009)
algorithms for mining the top-k high utility itemsets from a data stream with a window
- the FHMDS and FHMDS-Naive algorithms (Dawar et al. 2017)

Clustering

These algorithms automatically find clusters in different kinds of data

the original K-Means algorithm (MacQueen, 1967)
the Bisecting K-Means algorithm (Steinbach et al, 2000)
the K-Means++ algorithm (Arthur et al., 2007)
algorithms for density-based clustering
- the DBScan algorithm (Ester et al., 1996)
- the Optics algorithm to extract a cluster ordering of points, which can then be use to generate DBScan style clusters and more (Ankerst et al, 1999)
a hierarchical clustering algorithm
a tool called Cluster Viewer for visualizing clusters
a tool called Instance Viewer for visualizing the input of clustering algorithms

Time series mining

These algorithms perform various tasks to analyze time series data

an algorithm for converting a time series to a sequence of symbols using the SAX representation of time series. Note that if one converts a set of time series with SAX, he will obtain a sequence database, which allows to then apply traditional algorihtms for sequential rule mining and sequential pattern mining on time series (SAX, 2007).
algorithms for calculating the prior moving average of a time series (to remove noise)
algorithms for calculating the cumulative moving average f a time series (to remove noise)
algorithms for calculating the central moving average of a time series (to remove noise)
an algorithm for calculating the median smoothing of a time series (to remove noise)
an algorithm for calculating the exponential smoothing of a time series (to remove noise)
an algorithm for calculating the min max normalization of a time series
an algorithm for calculating the autocorrelation function of a time series
an algorithm for calculating the standardization of a time series
an algorithm for calculating the first and second order differencing of a time series
an algorithm for calculating the piecewise aggregate approximation of a time series (to reduce the number of data points of a time series)
an algorithm for calculating the linear regression of a time series (using the least squares method)
an algorithm for splitting a time series into segments of a given length
an algorithm for splitting a time series into a given number of segments
algorithms to cluster time series (group time-series according to their similarities). This can be done by applying the clustering algorithms offered in SPMF (K-Means, Bisecting K-Means, DBScan, OPTICS, Hierarchical clustering) on time series.
a tool called Time Series Viewer for visualizing time series

Classification

the ID3 algorithm for building decision trees (Quinlan, 1986)
the KNN (K-Nearest Neighbor) algorithm
Classification based on association rule mining
- the ACAC algorithm (Huang et al, 2011)
- the ACCF algorithm (Li et al., 2008)
- the ACN algorithm (Kundu et al., 2008)
- the ADT algorithm (Wang et al., 2000)
- the CBA algorithm (Liu et al., 1998)
- the CBA2 algorithm (Liu et al., 2001)
- the CMAR algorithm (Li et al, 2001)
- the L3 algorithm (Baralis et al, 2002)
- the MAC algorithm (Abdelhamid et al., 2012)
A framework for comparing multiple classifiers using holdout and k-fold cross-validation

Text mining

an algorithm for classifying text documents using a Naive Bayes classifier approach (S. Raghu, 2015)
an algorithm for clustering texts using the tf*idf measure (S. Raghu, 2015)

Dataset generation tools

A tool for generating a synthetic transaction database
A tool for generating a synthetic sequence database
A tool for generating a synthetic sequence database with timestamps
A tool for generating datasets for clustering

Dataset transformation tools

A tool for converting a sequence database to a transaction database
A tool for converting a transaction database to a sequence database
A tool for converting a text file to a sequence database (each sentences becomes a sequence)
A tool for converting a sequence database in various formats (CSV, KOSARAK, BMS, IBM...) to a sequence database in SPMF format
A tool for converting a transaction database in various formats (CSV...) to a transaction database in SPMF format
A tool for converting time-series to a sequence database
A tool to generate utility values for a transaction database
A tool to add timestamps to a sequence database
A tool to fix a transaction database having some problems (with or without utility/time information)
A tool for removing utility information from a database having utility information
A tool to resize a database in SPMF format (a text file) using a percentage of lines of data from an original database.

Dataset statistics tools

A tool for calculating statistics about a transaction database
A tool for calculating statistics about a transaction database with utility information
A tool for calculating statistics about a sequence database
A tool for calculating statistics about a graph database
A tool for calculating statistics about a product transaction database.
A tool for calculating statistics about a sequence database with cost and binary utility.
A tool for calculating statistics about a sequence database with cost and numeric utility.
A tool for calculating statistics about a sequence database with utility.
A tool for calculating statistics about a time-extended sequence database.
A tool for calculating statistics about a transaction database with cost and utility.
A tool for calculating statistics about a transaction database with utility and period information.
A tool for calculating statistics about a transaction database with utility and timestamps.
A tool for calculating statistics about an event sequence.
A tool for calculating statistics about an interval sequence database.
A tool for calculating statistics about a multi-dimensional sequence database.
A tool for calculating statistics about a multi-dimensional sequence database with timestamps.
A tool for calculating statistics about an uncertain transaction database.
A tool for calculating statistics about a file with double vectors (instances) for clustering.
A tool for calculating statistics about time series.

Dataset viewer tools

A times series viewer to visualize time series.
A cluster viewer to visualize clusters produced by clustering algorithms
A graph viewer to view files containing graphs or subgraphs taken as input or produced as output by algorithms such as TKG, gSpan and cgSgpan.
A simple tool to view the content of an ARFF file
A tool to view the content of an an event sequence file
A tool to view a sequence database cost binary utility file
A tool to view a sequence database cost numeric utility file
A tool to view a sequence database file
A tool to view a time-extended sequence database
A tool to view a multi-dimensional sequence database
A tool to view a multi-dimensional time sequence database
A tool to view a sequence utility database file
A tool to view a cost utility transaction database file
A tool to view a transaction database file
A tool to view an uncertain transaction database file
A tool to view a utility transaction database file
A tool to view a utility time transaction database file
A tool to view a utility period transaction database file
A tool to view a product transaction database file
A tool to view a graph database file
A tool to view a sequence database file with time intervals

GUI tools

The Algorithm Explorer tool to explore the algorithms offered in SPMF
The Memory Viewer tool to observe the memory usage of algorithms in real-time
The Pattern Viewer tool to view patterns found by algorithms and their frequency distributions
The Workflow Editor tool to create a worfklow with several algorithms and run it.
A tool to run experiments where one or more algorithms are run and a parameter is varied.
The SPMF text editor
A tool to download an offline copy of the SPMF documentation

Data structures

red-black tree,
itemset-tree,
binary tree,
KD-tree,
triangular matrix.
a collection of optimized primitive type data structures to replace hashmaps, lists, sets, etc.

Visual map of algorithms

You can visualize the relationship between the various data mining algorithms offered in SPMF by clicking on this map (last updated : 2015/09/12 - SPMF 0.97):