Due: September 17, 2010 (midnight)
Extended to September 20, 2010 (11:59pm)
For this assignment you will learn about the NaiveBayes
classifier and compare it to the OneR classifier.
- Consider the simple loan.arff dataset that
provides 18 training examples of whether or not a loan is approved for an
applicant based on their income, debt and education. Use WEKA to run the
NaiveBayes classifier on the loan dataset. Use the default parameter settings
for NaiveBayes, and use the training set as the test option.
- Include in your report the printed results from WEKA. Note that the
classifier is output as counts of attribute values by class. Also note that
the counts are one higher than in the dataset to avoid any zero
probabilities for P(attribute=value|class).
- What is the percent accuracy of NaiveBayes on the training set?
- How would this NaiveBayes classifier classify the instance:
Income=Medium, Debt=Medium, Education=MS? Show your work.
- Use WEKA to run the OneR classifier on the loan dataset.
Use the default parameter settings for OneR, and use the training
set as the test option.
- Include in your report the printed results from WEKA.
- What is the percent accuracy of OneR on the training set?
- How would this OneR classifier classify the instance:
Income=Medium, Debt=Medium, Education=MS? Justify your answer.
- Which learner, NaiveBayes or OneR, has more inductive bias?
Justify your answer.
- Perform the following experiment comparing the OneR and NaiveBayes
classifiers using default parameter settings on the datasets: loan (from
above), contact-lenses, iris, labor, segment-challenge, soybean, weather, and
weather.nominal (these 7 come with WEKA).
Include in your report a table giving the percent correctly classified
instances in the test split for both classifiers on each dataset.
- In WEKA, choose the "Experimenter" application.
- Under "Experiment Configuration Mode" choose "New".
- Under "Experiment Type" choose "Train/Test Percentage Split (data randomized)".
- Under "Datasets", choose "Add new..." for each of the 8 datasets.
Note that the segment-test dataset has been removed from the list.
- Under "Algorithms", choose "Add new..." for OneR and again for NaiveBayes.
- Now click on "Run" at the top, and then click "Start".
- Now click on "Analyse" at the top.
- Under "Source", choose "Experiment".
- Near the bottom, click on "Perform test".
- Compare the performance of the OneR and NaiveBayes classifiers
based on the results from the previous problem.
Specifically, which classifier performs better on which datasets and why.
The "why" part should consider the characteristics of the data, the
hypothesis space, and the learning algorithm.
- Email to me (email@example.com)
your nicely-formatted report (PDF preferred) containing your responses to
the above problems.