This example explains how to run the ID3 algorithm using the SPMF open-source data mining library.
How to run this example?
To run this example with the source code version of SPMF, launch the file "MainTestID3.java" in the package ca.pfv.SPMF.tests.
This example is not available in the release version of SPMF.
What is the ID3 algorithm?
The ID3 algorithm is a classic data mining algorithm for classifying instances (a classifier). It is well-known and described in many artificial intelligence and data mining books.
What is the input?
The input is a set of training data for building a decision tree.
For instance, in this example, we use the following database (from the book "Machine Learning by Tom Mitchell). This database is provided in the file "tennis.txt" of the SPMF distribution.
This database defines five attributes and contains fourteen instances.
outlook temp humid wind play? sunny hot high weak no sunny hot high strong no overcast hot high weak yes rain mild high weak yes rain cool normal weak yes rain cool normal strong no overcast cool normal strong yes sunny mild high weak no sunny cool normal weak yes rain mild normal weak yes sunny mild normal strong yes overcast mild high strong yes overcast hot normal weak yes rain mild high strong no
What is the output?
By applying the ID3 algorithm, a decision tree is created. To create the decision tree, we have to choose a target attribute. Here, we choose the attribute "play".
The following decision tree is created:
We can now use the tree to predict the value of the target attribute "play" for a new instance.
For example, consider this new instance, where the value for "play" is unknown.
sunny hot normal weak ?
By applying the decision tree, the value for the attribute "play" for this instance is "yes".
Input file format
The input file format is a text file. The first lines contains a list of attribute names separated by a single space. A attribute name is simply a string without spaces. The next lines represents instances, where each line contains a string value for each attribute, separated by single spaces. For example, consider the file "tennis.txt" of the previous example.
play outlook temp humid wind
no sunny hot high weak
no sunny hot high strong
yes overcast hot high weak
yes rain mild high weak
yes rain cool normal weak
no rain cool normal strong
yes overcast mild high strong
no sunny mild high weak
yes sunny cool normal weak
yes rain mild normal weak
yes sunny mild normal strong
yes overcast hot normal weak
yes overcast cool normal strong
no rain mild high strong
The first line defines the attribute names : "play", "outlook", "temp", "humid" and "wind". Then, consider the second line. It represents an instance having the value "no", "sunny", "hot", "high" and "weak" respectively for the five attributes. The next lines follow the same format.
Output file format
There is no output file for the ID3 algorithm. It is only available in the source code version of SPMF and it does not generate an output file.
Where can I get more information about the ID3 algorithm?
The ID3 algorithm was proposed by Quinlan (1986). It is one of the most popular algorithm for learning decision trees. By searching on the web, you can find plenty of information on this algorithm. It is also described in several data mining books and artificial intelligence books.
<< Return to table of contents of SPMF documentation
Copyright © 2008-2020 Philippe Fournier-Viger. All rights reserved.