Due: September 18, 2009 (midnight)
For this assignment you will learn about the decision-tree induction
classifier and compare it to the ConjunctiveRule classifier.
- Consider the simple loan.arff dataset that provides 18 training
examples of whether or not a loan is approved for an applicant based on their income, debt and
education. Using this data, compute the entropy for the entire dataset and the information gain
for each of the three attributes (Income, Debt, Education) as the top-level attribute in a
decision tree. Also indicate which of the three attributes is the best choice for the top-level
split attribute. Show all your work.
- Use WEKA to run the J48 decision-tree classifier on the loan.arff
dataset. Use the default parameter settings for J48, and use the training set as the test
- Include in your report the printed results (tree and statistics) from WEKA
- Draw graphically the decision tree classifier learned by J48.
- What is the percent accuracy of this tree on the training set?
- WEKA's default parameter settings for J48 are -C 0.25 -M 2.
- Explain in your own words what these parameters mean.
- Find a setting for the -C and -M parameters so that
the learned tree achieves 100% accuracy on the training set. Describe the
difference between this tree and the one learned in problem 2.
- Run the ConjunctiveRule and J48 classifiers using default parameter
settings on the loan.arff dataset and the following
eight datasets supplied with WEKA (J48 does not work with the cpu datasets,
because their class value is a real number).
For each run, use the Percentage Split test option with 66%
training. Include in your report a table giving the percent correctly
classified instances in the test split for both classifiers on each
- Compare the performance of the ConjunctiveRule and J48 classifiers
based on the results from the previous problem.
Specifically, which classifier performs better on which datasets and why.
The "why" part should consider the characteristics of the data, the
hypothesis space, and the learning algorithm.
- Email to me (email@example.com)
your nicely-formatted report (PDF preferred) containing your responses to
the above problems.