Mining Frequent Subgraphs using the TSeqMiner Algorithm (SPMF documentation)

This example explains how to run the TSeqMiner algorithm using the SPMF open-source data mining library.

How to run this example?

To run the example, you will need to first download the example datasets that can be used with AERMINER:: datasets (ZIP, 5.0 MB) and extract the files on your hardrive. Inside "AERMINER_datasets" you will find three subfolders containing three datasets:
- DBLP
- synthetic
- USFlight

Then,

What is this algorithm?

AERMiner is an algorithm for discovering strongly correlated rules indicating relationships between attribute values of a dynamic attributed graph, proposed by Fournier-Viger, P., He, G. et al. (2020)

What is the input of the algorithm?

The input of TSeqMiner is a dynamic attributed graph and 3 main parameters called minimum support, minimum confidence and minimum lift.

The dynamic attributed graph is described by 4 files, including:
- attributes_mapping.txt This file record all attributes names.
The format is:
A1
A2
A3
...

- vertices_mapping.txt This file describe all vertex entities and corresponding labels.
The format is:
0,v1
1,v2
2,v3
...

- attributes.txt This file record vertex and corresponding attribute values.
Line starting with 'T' represent a new graph. Within the range of the graph, in one line, the first item is vertex label and others are attribute values.
The format is:
T0
0 0.3 0.5 3.4
1 0.1 0.9 5.6
T1
0 0.2 0.3 3.3
1 0.1 0.5 1.3
...

- graph.txt This file describe the edge connections.
Line starting with 'T' represent a new graph. Within the range of the graph, in one line, the first item is a vertex label of currently considered, and the others are vertex labels of the connected vertices of the current vertex.
The format is:
T0
0 45 87 97 35
1 49 41 56 67
T1
0 78 32
1 10
...

The parameters that must be specified are as follows:
- minSup(double: 0-1)
The relative support of AER in dynamic attributed graph >= minSup
- minConf(double:0-1)
The confidence of AER >= minConf
- minLift(double:1-∞)
The lift of AER >= minLift

What is the output of TSeqMiner ?

Given a dynamic attributed graph and several parameters, the algorithm will output all attribute evolution rules with their size, sup, confidence.
For the above example, the output file looks like this:

size 2pattern,count: 6
CorePattern{ childAttr=57, parentAttr=[33]}sup: 445conf: 0.3571302037201063
CorePattern{ childAttr=42, parentAttr=[33]}sup: 507conf: 0.3789088054428889
CorePattern{ childAttr=87, parentAttr=[33]}sup: 774conf: 0.3491959304233673
CorePattern{ childAttr=54, parentAttr=[33]}sup: 625conf: 0.3096594857539958
CorePattern{ childAttr=31, parentAttr=[33]}sup: 644conf: 0.3103020739404869
CorePattern{ childAttr=81, parentAttr=[33]}sup: 460conf: 0.3405832921403856
size 3pattern,count: 11
CorePattern{ childAttr=33, parentAttr=[33, 33]}sup: 702conf: 0.34854877892852576
CorePattern{ childAttr=85, parentAttr=[31, 33]}sup: 456conf: 0.31965511735589974
CorePattern{ childAttr=52, parentAttr=[31, 33]}sup: 543conf: 0.3463641334062329
CorePattern{ childAttr=33, parentAttr=[13, 33]}sup: 491conf: 0.3333333333333333
CorePattern{ childAttr=85, parentAttr=[15, 33]}sup: 439conf: 0.31649616368286443
CorePattern{ childAttr=52, parentAttr=[15, 33]}sup: 524conf: 0.3307641196013289
CorePattern{ childAttr=33, parentAttr=[31, 33]}sup: 510conf: 0.35324791418355184
CorePattern{ childAttr=85, parentAttr=[33, 33]}sup: 519conf: 0.31474873624739813
CorePattern{ childAttr=52, parentAttr=[33, 33]}sup: 635conf: 0.33086967233308695
CorePattern{ childAttr=33, parentAttr=[15, 33]}sup: 568conf: 0.3441785426189836
CorePattern{ childAttr=52, parentAttr=[13, 33]}sup: 486conf: 0.33011531241619735
size 4pattern,count: 0

This file contains 17 attribute evolution rules.

Optional parameters available in the source code

Note that there is more parameters that can be used for AERMiner in the source code, in the class ParametersSettingAERMiner.java. These parameters are not offered in the command line as they are for very specific cases.

Performance

The AERMiner algorithm is the first algorithm for this problem. This is the original implementation.

Where can I get more information about the algorithm?

This is the article describing the algorithm:

Fournier-Viger, P., He, G., Lin, J. C.-W., Gomes, H. M. (2020). Mining Attribute Evolution Rules in Dynamic Attributed Graphs. Proc. 22nd Intern. Conf. on Data Warehousing and Knowledge Discovery (DAWAK 2020), Springer, pp. 167-182.