Due: October 17, 2008 (midnight)
No late homeworks will be accepted.
- Run the NaiveBayes classifier on the iris dataset.
Use the training set as the test option. Include in your submission the
printed results from WEKA.
- What type of distribution does WEKA's NaiveBayes
classifier assume for continuous attributes?
- Redo questions 3 and 4 from HW3, only substitute
NaiveBayes for ConjunctiveRule.
- Redo questions 6 and 7 from HW3, only substitute
NaiveBayes for J48.
- The UCI ML Repository contains a spam database
where the emails have already been processed to extract word
frequencies and other information.
- Download the data and convert it to WEKA's ARFF format.
- Describe the attributes used for the instances in this dataset.
- Run a 10-fold cross-validation test on the dataset using the
NaiveBayes classifier and report the accuracy achieved.
- Compare the NaiveBayes classifier with another classifier
we have used in WEKA (ConjunctiveRule, J48, or MultilayerPerceptron) using
the evaluation techniques we have learned while using WEKA.
- Email to me (firstname.lastname@example.org)
a zip file containing the following:
- Text file containing the raw output of the NaiveBayes
run on the iris dataset.
- Text file containing the raw output of the first experiment above
(result as from HW3 question 3h).
- Raw threshold curve data for NaiveBayes and MultilayerPerceptron
on the labor dataset (the two files you saved as in step 6e in HW3 ).
- ARFF file for the Spambase dataset.
- Any supporting files for your comparison in 5d.
- Nicely-formatted report (MSWord, PDF or PostScript) containing:
- Answer to question 2.
- Table summarizing results of experiment in question 3.
- Nicely-formatted plot of the two ROC curves.
- Discussion of performance comparison based on the ROC curves.
- Description of Spambase attributes (5b).
- Results of experiment in 5c.
- Comparison of NaiveBayes to other learner on the Spambase