SPMFAn Open-Source Data Mining Library

Introduction

Algorithms

Download

Documentation

Datasets

FAQ

License

Contributors

Citations

Performance

Developers' guide

Forum

Blog

Other resources

462655 visitors
since 2010

-------------
* Upcoming edited Springer book on high utility pattern mining

* We are hiring a postdoctoral researcher (details here) and a Ph.D. student.  Send your CV with cover letter to Prof. Fournier-Viger.
-------------

 

Algorithms

SPMF offers implementations of the following data mining algorithms.

Sequential Pattern Mining

For a good overview of sequential pattern mining algorithms, you may read this survey paper.

Sequential Rule Mining

Sequence Prediction

Frequent Itemset Mining

For a good overview of frequent itemset mining algorithms, you may read this survey paper.

Periodic Pattern Mining

  • the PFPM algorithm (Fournier-Viger et al, 2016a) for mining frequent periodic patterns in a sequence of transactions (a transaction database))new
  • the PHM algorithm (Fournier-Viger et al, 2016b, powerpoint) for mining periodic high-utility patterns (periodic patterns that yield a high profit) in a sequence of transactions (a transaction database) containing utility information new

High-Utility Pattern Mining

Association Rule Mining

  • an algorithm for mining all association rules in a transaction database (Agrawal & Srikant, 1994)
  • an algorithm for mining all association rules with the lift measure in a transaction database (adapted from Agrawal & Srikant, 1994)
  • an algorithm for mining the IGB informative and generic basis of association rules in a transaction database (Gasmi et al., 2005)
  • an algorithm for mining perfectly sporadic association rules (Koh & Roundtree, 2005)
  • an algorithm for mining closed association rules (Szathmary et al. 2006).
  • an algorithm for mining minimal non redundant association rules (Kryszkiewicz, 1998)
  • the Indirect algorithm for mining indirect association rules (Tan et al. 2000; Tan et 2006)
  • the FHSAR algorithm for hiding sensitive association rules (Weng et al. 2008)
  • the TopKRules algorithm for mining the top-k association rules (Fournier-Viger, 2012b, powerpoint)
  • the TNR algorithm for mining top-k non-redundant association rules (Fournier-Viger 2012d, powerpoint)

Clustering

  • the original K-Means algorithm (MacQueen, 1967)
  • the Bisecting K-Means algorithm (Steinbach et al, 2000)
  • algorithms for density-based clustering
    • the DBScan algorithm (Ester et al., 1996)
    • the Optics algorithm to extract a cluster ordering of points, which can then be use to generate DBScan style clusters and more (Ankerst et al, 1999)
  • a hierarchical clustering algorithm
  • a tool called Cluster Viewer for visualizing clusters new
  • a tool called Instance Viewer for visualizing the input of clustering algorithmsnew

Time series mining

  • an algorithm for calculating the moving average of a time series (to remove noise) new
  • an algorithm for calculating the piecewise aggregate approximation of a time series (to reduce the number of data points of a time series) new
  • an algorithm for calculating the linear regression line of a time series (using the least squares method) new
  • an algorithm for splitting a time series into segments of a given length new
  • an algorithm for splitting a time series into a given number of segmentsnew
  • the SAX algorithm for converting a time-series to a sequence database (to apply traditional algorihtms for sequential rule mining and sequential pattern mining on time series) (Lin et al., 2007). This allows to apply traditional pattern mining algorithms on time series. new
  • algorithms to cluster time series (group time-series according to their similarities). This can be done by applying the clustering algorithms offered in SPMF (K-Means, Bisecting K-Means, DBScan, OPTICS, Hierarchical clustering) on time series.
  • a tool called Time Series Viewer for visualizing time series new

Classification

  • the ID3 algorithm for building decision trees (Quinlan, 1986)

Text mining

  • an algorithm for classifying text documents using a Naive Bayes classifier approach (S. Raghu, 2015)
  • an algorithm for clustering texts using the tf*idf measure (S. Raghu, 2015)

Data structures

  • red-black tree,
  • itemset-tree,
  • binary tree,
  • KD-tree,
  • triangular matrix.

Tools

  • A tool for generating a synthetic transaction database
  • A tool for generating a synthetic sequence database
  • A tool for generating a synthetic sequence database with timestamps
  • A tool for calculating statistics about a transaction database
  • A tool for calculating statistics about a sequence database
  • A tool for converting a sequence database to a transaction database
  • A tool for converting a transaction database to a sequence database
  • A tool for converting a text file to a sequence database (each sentences becomes a sequence)
  • A tool for converting a sequence database in various formats (CSV, KOSARAK, BMS, IBM...) to a sequence database in SPMF format
  • A tool for converting a transaction database in various formats (CSV...) to a transaction database in SPMF format
  • A tool for converting time-series to a sequence database
  • A tool to generate utility values for a transaction database
  • A tool to add timestamps to a sequence database
  • A tool for removing utility information from a database  having utility information
  • A tool to resize a database in SPMF format (a text file) using a percentage of lines of data from an original database.
  • A tool for visualizing time-series

Visual map of algorithms

You can visualize the relationship between the various data mining algorithms offered in SPMF by clicking on this map (last updated : 2015/09/12 - SPMF 0.97):

map_algorithms_spmf_data_mining092_small

Copyright © 2008-2017 Philippe Fournier-Viger. All rights reserved.