Sequence Analysis and Prediction - Philippe Fournier-Viger

Projects

• Algorithms for sequence analysis, sequence prediction and pattern mining (2010-...)

• Data Mining Algorithms with Applications in Tutoring Agents   (2008 - 2011)

• Evaluating Spatial Reasoning in Intelligent Tutoring Systems  (2006 - 2010)

• A Cognitive and Ontology-Based Model for Building Glass-Box Learning Objects  (2005-2008)

• A Cognitive Model for Building "Cognitive Tutors"(2005-2010)

 

Algorithms for sequence analysis and sequence prediction 

Algorithms to discover sequential patterns

A sequence is an ordered list of symbols. A sequence is a general concept that exists in many domains. For example, in the domain of bioinformatics, protein sequences, microarray data and DNA fragments are sequences. Examples from other domains are sequences of webpages visited by users, sequences of transactions made by customers in a store, sequences of weather observations, educational data or medical record data.

To analyse sequences, a popular technique is to apply algorithms to discover sequential patterns such as PrefixSpan, SPAM, Spade and AprioriALL. I have developped multiple algorithms for discovering sequential patterns in sequences such as:

  1. TKS: an efficient algorithm to discover top-k sequential patterns in a set of sequences (ADMA 2013)
  2. MaxSP: an efficient algorithm to discover maximal sequential patterns in a set of sequences (ADMA 2013)
  3. TNS: an algorithm for discovering top-k non redundant sequential rules (2013)
  4. RuleGrowth : a very efficient algorithm for mining sequential rules that uses a pattern-growth approach together with a new process for rule discovery called "rule expansion" (2011).
  5. TopSeqRules: an algorithm for discovering the top-k most frequent sequential rules in a sequence database, where k is set by the user. (2011)
  6. TRuleGrowth : an algorithm for mining sequential rules while considering a maximum time duration constraint (a.k.a window size constraint). (2012)
  7. CMRules and CMDeo: algorithms to discover sequential rules that respectively use an association rule based approach and a levelwise approach (2010).
  8. An algorithm to discover sequential pattern from a set of sequences with anotations and time constraints (best paper award at MICAI 2008)

These algorithms have been applied in various domains such as e-learning (in the CanadarmTutor project), for webclick stream analysis (Fournier-Viger et al., 2012), manufacturing simulation (Kamsu-Foguem et al., 2013), quality control (Bogon et al., 2012), analyzing visitor movements (Orellana, 2011), modelling trends on social web (Christiansen, 2013) and restaurant recommendation (Han et al., 2012). Source code of the algorithms have been released as part of the open-source data mining library SPMF.

Algorithms to discover association rules and itemsets

I have also developped or collaborated on several algorithms for discovering itemsets and associations in transaction databases such as:

  1. Memory-Efficient Itemset Tree: a new data structure for online targeted association rule mining (ADMA, 2013)
  2. TopKRules (2012): an efficient algorithm to discover top-k association rules.
  3. TNR (2012): an efficient algorithm to discover top-k non redundant association rules.
  4. CHUD (ICDM, 2011): an efficient algorithm to discover closed+ high utility itemsets

Algorithms for sequence prediction

I have also made contribution on the topic of sequence prediction.

  1. Compact Prediction Tree (Gueniche et al, 2013): a new data structure that provide more accurate sequence prediction than popular approaches such as All-K-Markov, Dependency Graph and PPM for data such as web click stream.
  2. A sequential rule-based sequence prediction algorithm (ADMA 2012).

Main publications

[1] Gueniche, T., Fournier-Viger, P., Tseng, V.-S. (2013). Compact Prediction Tree: A Lossless Model for Accurate Sequence Prediction. Proc. 9th International Conference on Advanced Data Mining and Applications (ADMA 2013), Springer, LNAI, 12 pages (to appear).  
[2] Fournier-Viger, P., Wu, C.-W., Tseng, V.-S. (2013). Mining Maximal Sequential Patterns without Candidate Maintenance. Proc. 9th International Conference on Advanced Data Mining and Applications (ADMA 2013), Springer, LNAI, 12 pages (to appear).  
[3] Fournier-Viger, P., Gomariz, A., Gueniche, T., Mwamikazi, E., Thomas, R. (2013). Efficient Mining of Top-K Sequential Patterns. Proc. 9th International Conference on Advanced Data Mining and Applications (ADMA 2013), Springer, LNAI, 12 pages (to appear).  
[4] Fournier-Viger, P., Mwamikazi, E., Gueniche, T., Faghihi, U. (2013). Memory Efficient Itemset Tree for Targeted Association Rule Mining. Proc. 9th International Conference on Advanced Data Mining and Applications (ADMA 2013), Springer, LNAI, 12 pages (to appear).  
[5] Fournier-Viger, P., Tseng, V. S. (2013). TNS: Mining Top-K Non-Redundant Sequential Rules. Proc. 28th Symposium on Applied Computing (ACM SAC 2013). ACM Press, pp. 164-166.  
[6] Fournier-Viger, P., Tseng, V.S. (2012). Mining Top-K Non-Redundant Association Rules. Proc. 20th International Symposium on Methodologies for Intelligent Systems (ISMIS 2012), Springer, LNCS 7661, pp. 31- 40.  
[7] Fournier-Viger, P. Gueniche, T., Tseng, V.S. (2012). Using Partially-Ordered Sequential Rules to Generate More Accurate Sequence Prediction. Proc. 8th International Conference on Advanced Data Mining and Applications (ADMA 2012), Springer LNAI 7713, pp.431-442.  
[8] Fournier-Viger, P., Faghihi, U., Nkambou, R., Mephu Nguifo, E. (2012). CMRules: Mining Sequential Rules Common to Several SequencesKnowledge-based Systems, Elsevier, 25(1): 63-76.  
[9] Fournier-Viger, P., Wu, C.-W., Tseng, V.S., Nkambou, R. (2012). Mining Sequential Rules Common to Several Sequences with the Window Size ConstraintProceedings of the 25th Canadian Conf. on Artificial Intelligence (AI 2012), Springer, LNAI 7310, pp.299-304.  
[10]Wu, C.-W., Fournier-Viger, P., Yu., P. S., Tseng, V. S. (2011). Efficient Mining of a Concise and Lossless Representation of High Utility ItemsetsProceedings of the 11th IEEE International Conference on Data Mining (ICDM 2011). IEEE CS Press, pp.824-833.  
[11] Fournier-Viger, P. & Tseng, V. S. (2011). Mining Top-K Sequential RulesProceedings of the 7th Intern. Conf. on Advanced Data Mining and Applications (ADMA 2011). LNAI 7121, Springer, pp.180-194.  
[12] Fournier-Viger, P., Nkambou, R. & Tseng, V. S. (2011). RuleGrowth: Mining Sequential Rules Common to Several Sequences by Pattern-Growth. Proceedings of the 26th Symposium on Applied Computing (ACM SAC 2011). ACM Press, pp. 954-959.  
(see the "publications" section of the webiste for more articles)