Frequently Asked Questions
- How can I install the software?
- The software ran out of memory. What should I do?
- Do you have the source code of the XXXXXXX algorithm ?
- Could you implement the XXXXXXX algorithm for me?
- Could I participate in the development of your software?
- How the source code is organized?
- Can I use SPMF in commercial software? Can I include your source code in my software?
- Could you explain how the XXXXXX algorithm works?
- Could you give me some examples of how to use the XXXXXXX algorithm?
- Do you have the C++, C# or <insert another programming language here> version of the XXXXXX algorithm?
- Where can I find some large datasets?
- I have found a bug in your software!
- The software is very useful. How can I say thank you?
- How should I cite SPMF?
- Other questions
Please read the installation instructions on the download page.
For most data mining algorithms the memory usage depends on the parameters of the algorithm and the kind of data. For example, for the Apriori algorithm, the performance depends on (1) the number of transactions, (2) their length, (3) the number of items, (3) the dataset is dense or sparse, (4) the minsup parameter, etc.
Some old versions of the Java virtual machine use by default only 256 megabytes of RAM. This is very small. Therefore it is easy to run out of memory. To avoid running out of memory, it is possible to increase the memory that the Java virtual machine uses. If you are using the release version of SPMF, you can launch the software from the command line and use the XMX parameter to increase the memory that SPMF can use:
java -Xmx1024m -jar spmf.jar
This indicates that the software can now use up to 1 GB of RAM.
If you are using the source code version with Eclipse, you can increase the memory by doing this:
Go in the menu Run > Run Configurations > then select the class that you usually run such as "MainTestApriori" for Apriori > go to the "Arguments" tab > Then paste the following text in the "VM Arguments" field:
Then press "Run".
If you have increased the memory and the algorithm still run out of memory, you should consider changing the parameters of the algorithm that you are using because parameters often have a huge influence on the performance of an algorithm. In some cases, decreasing or increasing a parameter can increase the size of the search space exponentially, which results in long runtimes and memory usage. This is for example the case for the Apriori algorithm. For this algorithm, decreasing the minsup parameter can results in finding millions of itemsets, and having very long runtimes, while if the minsup parameter is set higher few itemsets may be found and the algorithm may be very fast. Thus, for such algorithm, it is recommended to use the parameters to set a strict constraint (e.g. a high minsup value) to see the result. Then, if the result is satisfying, you may change the parameter again (e.g. decrease the minsup parameter to find more itemsets).
Some algorithms like CM-SPAM also let you specify additional constraints such as a maximum length for patterns to be found. Generally, the more you use strict constraints (e.g. a maximum size of 2 items), the smaller the search space will be, less patterns will be found, and the faster the algorithm will be. Thus, using constraints is another way of improving the performance of the algorithms.
Besides, if an algorithm is not efficient enough, another solution is to use a better algorithm. For example, for frequent itemset mining, there are several algorithms in SPMF that have the same input and output such as Apriori and FPGrowth . But, FPGrowth is generally much more efficient than Apriori. Thus, facing performance issues with Apriori, a good idea is to try using FPGrowth instead.
If the algorithm is not in the list of algorithms, then I don't have it.
I usually choose the algorithms that I implement according to my interests. If you would like that I implement a particular algorithm, you can send me a suggestion by e-mail with (1) the name of the algorithm and (2) the article describing the algorithm. I will read your suggestion. Then, I will evaluate if I'm interested to implement it or not. Then if I'm interested by the algorithm, it can take days, weeks or months before I have time to implement it (it depends on my schedule). If I'm not interested, I will not implement it.
Yes. If you are interested to participate you can write me an e-mail. I'm interested in source code for algorithms that I have not implemented. You can send me the source code and I will then evaluate the quality of your code and if it is good enough and if I think that the algorithm is useful, I will include it in the next version of the software. If I include your code, I will add your name to the list of contributors on the website.
If you want to understand how the source code is organized in SPMF, you can read the developers guide. It provides some useful information to understand/modify the source code.
SPMF is licensed under the GNU GPL v3 license.
The GPL license provides four freedoms:
- Obtain and run the program for any purpose
- Get a copy of the source code
- Modify the source code
- Re-distribute the modified source code
But if you want to redistribute the source code, you must:
- provide access to the source code,
- license derived work under the same GPL v3 license
For more details about what you can and cannot do, please read the GNU GPL license.
If you want to know how a particular algorithm works, you should read the original article describing the algorithm or contact the corresponding author of that article.
Each algorithm offered in SPMF comes with an example. You can also try an algorithm with its example to see what is the input and output.
If you are interested in understanding how the code work for an algorithm, it can be also good to run the algorithm using a debugger to see step by step what happens.
Besides, if you have some questions but you cannot find the answer to your question in these articles, you can ask your question in my data mining forum. I will try to answer you if the question is simple and the answer is short. Or someone else may also answer your question.
The examples are in the documentation section of the website. You may also consider reading the article describing the algorithm for more information about the algorithm.
10. Do you have the C++, C# or <insert another programming language here> version of the XXXXXX algorithm?
No. If it is not on the website, then I don't have it.
You can check the "datasets" page of this website. It provides download links and information for obtaining several popular datasets used in the data mining literature that can be used with SPMF.
Please send me information about the bug by e-mail . I will try to fix it as soon as possible.
If you appreciate the software, the best way to say thank you is to cite the website in your thesis/papers/articles, post link to this website on the internet so that more people can find it and to recommend it to your colleagues.
Please cite SPMF as follows:
Fournier-Viger, P., Lin, C.W., Gomariz, A., Gueniche, T., Soltani, A., Deng, Z., Lam, H. T. (2016). The SPMF Open-Source Data Mining Library Version 2. Proc. 19th European Conference on Principles of Data Mining and Knowledge Discovery (PKDD 2016) Part III, Springer LNCS 9853, pp. 36-40.
If you have a general question, please ask it in the data mining forum so that I can answer you and that the answer can be shared with everyone else. But you can also ask me by e-mail if the question has to be private. I will try to answer as quickly as possible. But if your question is long and requires a long answer or if I'm currently busy, I may takes a few days before I answer.