SPMFAn Open-Source Data Mining Library

Introduction

Algorithms

Download

Documentation

Datasets

FAQ

License

Contributors

Citations

Performance

Developers' guide

Forum

Mailing-list

Blog

 

-------------
We are hiring a postdoctoral researcher (details here) and a Ph.D. student.  Send your CV with cover letter to Prof. Fournier-Viger.
-------------

348312 visitors
since 2010

Algorithms

SPMF offers implementations of the following data mining algorithms.

Sequential Pattern Mining

Sequential Rule Mining

Sequence Prediction

Frequent Itemset Mining

Periodic Pattern Mining

  • the PFPM algorithm (Fournier-Viger et al, 2016a) for mining frequent periodic patterns in a sequence of transactions (a transaction database))new
  • the PHM algorithm (Fournier-Viger et al, 2016b) for mining periodic high-utility patterns (periodic patterns that yield a high profit) in a sequence of transactions (a transaction database) containing utility information new

High-Utility Pattern Mining

  • algorithms for mining high-utility itemsets in a transaction database having profit information
  • algorithm for efficiently mining high-utility itemsets with length constraints in a transaction database
  • algorithm for mining correlated high-utility itemsets in a transaction database
  • algorithm for mining high-utility itemsets in a transaction database containing negative unit profit values
  • algorithm for mining frequent high-utility itemsets in a transaction database
  • algorithm for mining on-shelf high-utility itemsets in a transaction database containing information about time periods of items
  • algorithm for incremental high-utility itemset mining
  • algorithm for mining concise representations of high-utility  itemsets in a transaction database
  • algorithm for mining the skyline high-utility itemsets
  • algorithm for mining high-utility sequential rules in a sequence database 
  • algorithm for mining high-utility sequential patterns in a sequence database 
    • the USPAN algorithm (Yin et al. 2012) new

Association Rule Mining

  • an algorithm for mining all association rules from a transaction database (Agrawal & Srikant, 1994)
  • an algorithm for mining all association rules with the lift measure from a transaction database (adapted from Agrawal & Srikant, 1994)
  • an algorithm for mining the IGB informative and generic basis of association rules from a transaction database (Gasmi et al., 2005)
  • an algorithm for mining perfectly sporadic association rules (Koh & Roundtree, 2005)
  • an algorithm for mining closed association rules (Szathmary et al. 2006).
  • an algorithm for mining minimal non redundant association rules (Kryszkiewicz, 1998)
  • the Indirect algorithm for mining indirect association rules (Tan et al. 2000; Tan et 2006)
  • the FHSAR algorithm for hiding sensitive association rules (Weng et al. 2008)
  • the TopKRules algorithm for mining the top-k association rules (Fournier-Viger, 2012b, powerpoint)
  • the TNR algorithm for mining top-k non-redundant association rules (Fournier-Viger 2012d, powerpoint)

Clustering

  • the original K-Means algorithm (MacQueen, 1967)
  • the Bisecting K-Means algorithm (Steinbach et al, 2000)
  • algorithms for density-based clustering
    • the DBScan algorithm (Ester et al., 1996)
    • the Optics algorithm to extract a cluster ordering of points, which can then be use to generate DBScan style clusters and more (Ankerst et al, 1999)
  • a hierarchical clustering algorithm
  • a tool called Cluster Viewer for visualizing clusters new
  • a tool called Instance Viewer for visualizing the input of clustering algorithmsnew

Time series mining

  • an algorithm for calculating the moving average of a time series (to remove noise) new
  • an algorithm for calculating the piecewise aggregate approximation of a time series (to reduce the number of data points of a time series) new
  • an algorithm for splitting time series into segments of a given length new
  • an algorithm for splitting time series into a given number of segmentsnew
  • the SAX algorithm for converting a time-series to a sequence database (to apply traditional algorihtms for sequential rule mining and sequential pattern mining on time series) (Lin et al., 2007) new
  • a tool called Time Series Viewer for visualizing time series new
  • Besides, it is also possible to apply most clustering algorithms offered in SPMF (K-Means, Bisecting K-Means, DBScan, OPTICS, Hierarchical clustering) on time series.

Classification

  • the ID3 algorithm for building decision trees (Quinlan, 1986)

Text mining

  • an algorithm for classifying text documents using a Naive Bayes classifier approach (S. Raghu, 2015) new
  • an algorithm for clustering texts using the tf*idf measure (S. Raghu, 2015) new

Data structures

  • red-black tree,
  • itemset-tree,
  • binary tree,
  • KD-tree,
  • triangular matrix.

Tools

  • A tool for generating a synthetic transaction database
  • A tool for generating a synthetic sequence database
  • A tool for generating a synthetic sequence database with timestamps
  • A tool for calculating statistics about a transaction database
  • A tool for calculating statistics about a sequence database
  • A tool for converting a sequence database to a transaction database
  • A tool for converting a transaction database to a sequence database
  • A tool for converting a text file to a sequence database (each sentences becomes a sequence)
  • A tool for converting a sequence database in various formats (CSV, KOSARAK, BMS, IBM...) to a sequence database in SPMF format
  • A tool for converting a transaction database in various formats (CSV...) to a transaction database in SPMF format
  • A tool for converting time-series to a sequence database
  • A tool to generate utility values for a transaction database
  • A tool to add timestamps to a sequence database
  • A tool for removing utility information from a database  having utility information
  • A tool to resize a database in SPMF format (a text file) using a percentage of lines of data from an original database.
  • A tool for visualizing time-series

Visual map of algorithms

You can visualize the relationship between the various data mining algorithms offered in SPMF by clicking on this map (last updated : 2015/09/12 - SPMF 0.97):

map_algorithms_spmf_data_mining092_small

Copyright © 2008-2017 Philippe Fournier-Viger. All rights reserved.