Algorithms

SPMF offers implementations of the following data mining algorithms.

No algorithms match your search. Try a different keyword.

Sequential Pattern Mining

These algorithms discover sequential patterns in a set of sequences. For a good overview of sequential pattern mining algorithms, please read this survey paper.

Sequential Rule Mining

These algorithms discover sequential rules in a set of sequences.

Sequence Prediction

These algorithms predict the next symbol of a sequence based on a set of training sequences.

Itemset Mining

These algorithms discover interesting itemsets (sets of values) in a transaction database. For a good overview, please read this survey paper.

  • algorithms for discovering frequent itemsets in a transaction database
  • algorithms for discovering frequent closed itemsets in a transaction database
  • algorithms for recovering all frequent itemsets from frequent closed itemsets
    • the LevelWise algorithm (Pasquier et al., 1999)
    • the DFI-Growth algorithm (Huang et al., 2019)
    • the DFI-List algorithm (Wu et al., 2020)
  • algorithms for discovering frequent maximal itemsets in a transaction database
    • the FPMax algorithm (Grahne and Zhu, 2003)
    • the Charm-MFI algorithm (Szathmary et al., 2006)
    • the CARPENTER-MAX algorithm (Pan et al., 2003) new
    • the GENMAX algorithm (Gouda et al., 2005) new
  • algorithms for mining frequent itemsets with multiple minimum supports
  • algorithms for mining generator itemsets in a transaction database
  • algorithms for mining rare itemsets and/or correlated itemsets in a transaction database
    • the AprioriInverse algorithm – perfectly rare itemsets (Koh & Roundtree, 2005, ▶ video) and AprioriTIDInverse (vertical structure variant)
    • the AprioriRare algorithm – minimal rare itemsets and frequent itemsets (Szathmary et al., 2007, ▶ video) and AprioriTIDRare (vertical structure variant)
    • the CORI algorithm – minimal rare correlated itemsets using support and bond measures (Bouasker et al., 2015, ▶ video)
    • the RP-Growth algorithm (Tsang et al., 2011) new
  • algorithms for performing targeted and dynamic queries about association rules and frequent itemsets
  • algorithms to discover frequent itemsets in a stream
  • the U-Apriori algorithm – frequent itemsets in uncertain data (Chui et al., 2007)
  • the VME algorithm – erasable itemsets (Deng & Xu, 2010)
  • algorithms to discover fuzzy frequent itemsets in a quantitative transaction database
  • the OPUS-Miner algorithm – self-sufficient itemsets (Webb et al., 2014)
  • algorithms to discover compressing itemsets based on the MDL principle
    • the KRIMP algorithm (Vreeken et al., 2011) new
    • the SLIM algorithm (Smets et al., 2012) new
    • the GRIMP algorithm (Nawaz et al., 2025) new
    • the HMP-SA algorithm (Chen et al., 2026) new
    • the HMP-HC algorithm (Chen et al., 2026) new
  • algorithms to discover the top-k most frequent itemsets
    • the Apriori(top-k) algorithm (modified Apriori) new
    • the FPGrowth(top-k) algorithm (modified FP-Growth) new

Episode Mining

These algorithms discover patterns (episodes) in a single sequence of events. For a good overview, please read this survey paper.

  • algorithms for mining frequent episodes
  • algorithms for mining episode rules
  • algorithms for mining high-utility episodes in a sequence of complex events with utility information
  • algorithms for mining nonoverlapping sequential patterns in one or many sequences of symbols
  • algorithms for mining frequent sequential patterns with periodic wildcard gaps in a sequence of characters
    • the MAPD algorithm (Wu, Y. et al., 2014)
  • algorithms for mining self-adaptive one-off weak-gap strong sequential patterns in a sequence of characters
    • the OWSP-Miner algorithm (Wu, Y. et al., 2022) new

Periodic Pattern Mining

These algorithms discover patterns that periodically appear in a sequence of records (e.g. transactions).

  • algorithms for finding frequent periodic patterns in a single sequence of events
  • algorithms for mining stable periodic itemsets in a sequence of events with or without timestamps
  • algorithms for mining locally periodic patterns in a transaction database with or without timestamps
  • algorithms for discovering periodic patterns that are significant or non-redundant
    • the NPFPM algorithm – non-redundant periodic frequent itemsets (Afriyie et al., 2020, 2021) new
    • the PPFP algorithm – productive periodic frequent itemsets (Nofong, 2016) new
    • the SRPFPM algorithm – self-reliant periodic frequent patterns (Nofong et al., 2021) new
  • algorithms for mining periodic high-utility itemsets in a sequence of transactions with utility information
    • the PHM algorithm (Fournier-Viger et al., 2016, 📊 slides ▶ video)
    • the PHMN algorithm (2023) – periodic high-utility itemsets with positive or negative utility new
    • the PHMN+ algorithm (2023) – periodic high-utility itemsets with positive or negative utility new
    • the PHM_irregular algorithm – irregular (non-periodic) high-utility itemsets (variation of PHM)
  • algorithms for finding periodic patterns in multiple sequences of events
  • algorithms for mining rare correlated periodic patterns common to multiple sequences

Graph Pattern Mining new

These algorithms discover patterns in graphs.

High-Utility Pattern Mining

These algorithms discover patterns having a high utility (importance) in different kinds of data. For a good overview, read the survey paper or the high utility-pattern mining book.

Association Rule Mining

These algorithms discover interesting associations between symbols (values) in a transaction database (database records with binary attributes).

  • an algorithm for mining all association rules with the confidence measure (Agrawal & Srikant, 1994, ▶ video)
  • an algorithm for mining all association rules with the lift measure (adapted from Agrawal & Srikant, 1994)
  • an algorithm for mining the IGB informative and generic basis of association rules (Gasmi et al., 2005)
  • an algorithm for mining perfectly sporadic association rules (Koh & Roundtree, 2005)
  • an algorithm for mining closed association rules (Szathmary et al., 2006)
  • an algorithm for mining minimal non-redundant association rules (Kryszkiewicz, 1998)
  • the Indirect algorithm – indirect association rules (Tan et al., 2000; Tan et al., 2006)
  • the FHSAR algorithm – hiding sensitive association rules (Weng et al., 2008)
  • the TopKRules algorithm – top-k association rules (Fournier-Viger, 2012, 📊 slides)
  • the ETARM algorithm – top-k association rules (Nguyen et al., 2017) new
  • the FTARM algorithm – top-k association rules (Liu et al., 2019) new
  • the TopKClassRules algorithm – top-k class association rules (Fournier-Viger, 2012, 📊 slides)
  • the TNR algorithm – top-k non-redundant association rules (Fournier-Viger, 2012, 📊 slides)
  • the HGB and HGB_All – high-utility association rules (Sahoo et al., 2015) new
  • algorithms for mining class association rules
    • the ACAC algorithm (Huang et al., 2011)
    • the ACCF algorithm (Li et al., 2008)
    • the ACN algorithm (Kundu et al., 2008)
    • the ADT algorithm (Wang et al., 2000)
    • the CBA algorithm (Liu et al., 1998)
    • the CBA2 algorithm (Liu et al., 2001)
    • the CMAR algorithm (Li et al., 2001)
    • the L3 algorithm (Baralis et al., 2002)
    • the MAC algorithm (Abdelhamid et al., 2012)

Stream Pattern Mining

These algorithms discover various kinds of patterns in a stream (an infinite sequence of database records).

  • the estDec algorithm – recent frequent itemsets in a data stream (Chang & Lee, 2003)
  • the estDec+ algorithm – recent frequent itemsets in a data stream (Shin et al., 2014)
  • the CloStream algorithm – frequent closed itemsets in a data stream (Yen et al., 2009)
  • algorithms for mining the top-k high-utility itemsets from a data stream

Clustering

These algorithms automatically find clusters in different kinds of data.

  • the original K-Means algorithm (MacQueen, 1967)
  • the Bisecting K-Means algorithm (Steinbach et al., 2000)
  • the K-Means++ algorithm (Arthur et al., 2007) new
  • algorithms for density-based clustering
    • the DBScan algorithm (Ester et al., 1996)
    • the Optics algorithm - extract a cluster ordering of points, which can then be use to generate DBScan style clusters and more (Ankerst et al., 1999)
    • the Density Peak Clustering (DPC) algorithm (Rodriguez et al., 2014)
    • the AEDBScan algorithm (Mistry et al., 2021)
  • a hierarchical clustering algorithm
  • a tool called Cluster Viewer for visualizing clusters
  • a tool called Instance Viewer for visualizing the input of clustering algorithms

Time Series Mining

These algorithms perform various tasks to analyze time series data.

  • converting a time series to a sequence of symbols using the SAX representation of time series. Note that if one converts a set of time series with SAX, he will obtain a sequence database, which allows to then apply traditional algorihtms for sequential rule mining and sequential pattern mining on time series  (SAX, 2007)
  • calculating the prior moving average (noise removal) of a time series
  • calculating the cumulative moving average (noise removal) of a time series
  • calculating the central moving average (noise removal) of a time series
  • calculating the median smoothing (noise removal) of a time series
  • calculating the exponential smoothing (noise removal) of a time series
  • calculating the min-max normalization of a time series
  • calculating the autocorrelation function of a time series
  • calculating the standardization of a time series
  • calculating the first and second order differencing of a time series
  • calculating the piecewise aggregate approximation (data point reduction) of a time series
  • calculating the linear regression (least squares method) of a time series
  • splitting a time series into segments of a given length
  • splitting a time series into a given number of segments
  • clustering time series(group time-series according to their similarities). This can be done by applying the clustering algorithms offered in SPMF (K-Means, Bisecting K-Means, DBScan, OPTICS, Hierarchical clustering) on time series.
  • a tool called Time Series Viewer for visualizing time series

Classification

  • the ID3 algorithm for building decision trees (Quinlan, 1986)
  • the KNN (K-Nearest Neighbor) algorithm
  • classification based on class association rule mining
    • the ACAC algorithm (Huang et al., 2011)
    • the ACCF algorithm (Li et al., 2008)
    • the ACN algorithm (Kundu et al., 2008)
    • the ADT algorithm (Wang et al., 2000)
    • the CBA algorithm (Liu et al., 1998)
    • the CBA2 algorithm (Liu et al., 2001)
    • the CMAR algorithm (Li et al., 2001)
    • the L3 algorithm (Baralis et al., 2002)
    • the MAC algorithm (Abdelhamid et al., 2012)
  • a framework for comparing multiple classifiers using holdout and k-fold cross-validation

Text Mining

  • an algorithm for classifying text documents using a Naive Bayes classifier approach (S. Raghu, 2015)
  • an algorithm for clustering texts using the tf*idf measure (S. Raghu, 2015)

Dataset Generation Tools

  • A tool for generating a synthetic transaction database
  • A tool for generating a synthetic sequence database
  • A tool for generating a synthetic sequence database with timestamps
  • A tool for generating datasets for clustering

Dataset Transformation Tools

  • A tool for converting a sequence database to a transaction database
  • A tool for converting a transaction database to a sequence database
  • A tool for converting a text file to a sequence database (each sentence becomes a sequence)
  • A tool for converting a sequence database in various formats (CSV, KOSARAK, BMS, IBM…) to SPMF format
  • A tool for converting a transaction database in various formats (CSV…) to SPMF format
  • A tool for converting time-series to a sequence database
  • A tool to generate utility values for a transaction database
  • A tool to add timestamps to a sequence database
  • A tool to fix a transaction database having problems (with or without utility/time information)
  • A tool for removing utility information from a database having utility information
  • A tool to resize a database in SPMF format using a percentage of lines
  • A tool to sample records from a dataset (reservoir, seed, etc.)
  • A tool to remove duplicated records from a dataset

Dataset Statistics Tools

  • A tool for calculating statistics about a transaction database
  • A tool for calculating statistics about a transaction database with utility information
  • A tool for calculating statistics about a sequence database
  • A tool for calculating statistics about a graph database
  • A tool for calculating statistics about a product transaction database new
  • A tool for calculating statistics about a sequence database with cost and binary utility new
  • A tool for calculating statistics about a sequence database with cost and numeric utility new
  • A tool for calculating statistics about a sequence database with utility new
  • A tool for calculating statistics about a time-extended sequence database new
  • A tool for calculating statistics about a transaction database with cost and utility new
  • A tool for calculating statistics about a transaction database with utility and period information new
  • A tool for calculating statistics about a transaction database with utility and timestamps new
  • A tool for calculating statistics about an event sequence new
  • A tool for calculating statistics about an interval sequence database new
  • A tool for calculating statistics about a multi-dimensional sequence database new
  • A tool for calculating statistics about a multi-dimensional sequence database with timestamps new
  • A tool for calculating statistics about an uncertain transaction database new
  • A tool for calculating statistics about a file with double vectors (instances) for clustering
  • A tool for calculating statistics about time series

Dataset Viewer Tools

  • A time series viewer to visualize time series
  • A cluster viewer to visualize clusters produced by clustering algorithms
  • A graph viewer to view files containing graphs or subgraphs (TKG, gSpan, cgSpan)
  • A simple tool to view the content of an ARFF file new
  • A tool to view the content of an event sequence file new
  • A tool to view a sequence database cost binary utility file new
  • A tool to view a sequence database cost numeric utility file new
  • A tool to view a sequence database file new
  • A tool to view a time-extended sequence database new
  • A tool to view a multi-dimensional sequence database new
  • A tool to view a multi-dimensional time sequence database new
  • A tool to view a sequence utility database file new
  • A tool to view a cost utility transaction database file new
  • A tool to view a transaction database file new
  • A tool to view an uncertain transaction database file new
  • A tool to view a utility transaction database file new
  • A tool to view a utility time transaction database file new
  • A tool to view a utility period transaction database file new
  • A tool to view a product transaction database file new
  • A tool to view a graph database file
  • A tool to view a sequence database file with time intervals new
  • A tool to view a taxonomy file new

GUI Tools

  • The Algorithm Explorer tool to explore the algorithms offered in SPMF
  • The Memory Viewer tool to observe the memory usage of algorithms in real-time new
  • The Pattern Viewer tool to view patterns found by algorithms and their frequency distributions
  • The Workflow Editor tool to create a workflow with several algorithms and run it new
  • A tool to run experiments where one or more algorithms are run and a parameter is varied
  • The SPMF text editor
  • A tool to download an offline copy of the SPMF documentation new
  • A tool called Pattern Diff Analyzer to compare two files of patterns to find contrast patterns new
  • A tool called Algorithm Graph Viewer to view the similarity between algorithms as a graph new

Other Tools

  • A tool to export the list of algorithms to a JSON file new

Data Structures

  • Red-black tree
  • Itemset-tree
  • Binary tree
  • KD-tree
  • Triangular matrix
  • A collection of optimized primitive-type data structures to replace hashmaps, lists, sets, etc.

Visual Map of Algorithms

You can visualize the relationship between the various data mining algorithms offered in SPMF by clicking on this map (last updated: 2015/09/12 – SPMF 0.97):

SPMF Algorithm Map