Machine Learning

Homework 2

Due: September 17, 2010 (midnight)
Extended to September 20, 2010 (11:59pm)

For this assignment you will learn about the NaiveBayes classifier and compare it to the OneR classifier.

  1. Consider the simple loan.arff dataset that provides 18 training examples of whether or not a loan is approved for an applicant based on their income, debt and education. Use WEKA to run the NaiveBayes classifier on the loan dataset. Use the default parameter settings for NaiveBayes, and use the training set as the test option.
    1. Include in your report the printed results from WEKA. Note that the classifier is output as counts of attribute values by class. Also note that the counts are one higher than in the dataset to avoid any zero probabilities for P(attribute=value|class).
    2. What is the percent accuracy of NaiveBayes on the training set?
    3. How would this NaiveBayes classifier classify the instance: Income=Medium, Debt=Medium, Education=MS? Show your work.
  2. Use WEKA to run the OneR classifier on the loan dataset. Use the default parameter settings for OneR, and use the training set as the test option.
    1. Include in your report the printed results from WEKA.
    2. What is the percent accuracy of OneR on the training set?
    3. How would this OneR classifier classify the instance: Income=Medium, Debt=Medium, Education=MS? Justify your answer.
  3. Which learner, NaiveBayes or OneR, has more inductive bias? Justify your answer.
  4. Perform the following experiment comparing the OneR and NaiveBayes classifiers using default parameter settings on the datasets: loan (from above), contact-lenses, iris, labor, segment-challenge, soybean, weather, and weather.nominal (these 7 come with WEKA).
    1. In WEKA, choose the "Experimenter" application.
    2. Under "Experiment Configuration Mode" choose "New".
    3. Under "Experiment Type" choose "Train/Test Percentage Split (data randomized)".
    4. Under "Datasets", choose "Add new..." for each of the 8 datasets. Note that the segment-test dataset has been removed from the list.
    5. Under "Algorithms", choose "Add new..." for OneR and again for NaiveBayes.
    6. Now click on "Run" at the top, and then click "Start".
    7. Now click on "Analyse" at the top.
    8. Under "Source", choose "Experiment".
    9. Near the bottom, click on "Perform test".
    Include in your report a table giving the percent correctly classified instances in the test split for both classifiers on each dataset.
  5. Compare the performance of the OneR and NaiveBayes classifiers based on the results from the previous problem. Specifically, which classifier performs better on which datasets and why. The "why" part should consider the characteristics of the data, the hypothesis space, and the learning algorithm.
  6. Email to me (holder@eecs.wsu.edu) your nicely-formatted report (PDF preferred) containing your responses to the above problems.