Due: November 12, 2010 (midnight)
For this assignment you will become familiar with kernel machines and
- Using WEKA's explorer environment, load the loan.arff dataset from HW2/HW3 in the Preprocess tab.
Choose the functions.SMO classifier (this is WEKA's SVM classifier) in the
Classify tab and change the -E option on the PolyKernel to 2.0 (i.e., we will
use (x.y)^2 as our kernel). Choose "Use training set" as the Test option, and
click Start to run.
- Include the WEKA output in your report.
- The classifier consists of a set of arithmetic terms. Explain what
each number/symbol in the first line of the classifier represents.
- Consider the test instance <Income=Medium, Debt=Medium,
Education=MS>. Using the classifier learned above,
compute the numeric result of the classifer and the predicted class
for this instance. Show your work.
- Using WEKA's experimenter environment, perform the following experiment.
- Choose a "New" experiment.
- Choose the default Experiment Type: 10-fold cross-validation and
- Choose default Iteration Control: 10 repetitions and data sets first.
- Select the following datasets that come with WEKA: diabetes, labor,
iris, and vote.
- Select the following classifiers with default parameter settings,
except as noted:
- lazy.IBk -K 3
- functions.SMO (PolyKernel with -E 2.0)
- meta.AdaBoostM1 (with J48 as the base classifier)
- Run the experiment.
- Analyze the results using the following configuration, and perform
- Testing with: Paired T-Tester (corrected)
- Comparison field: Percent_correct
- Significance: 0.05
- Test base: bayes.NaiveBayes
- Show std. deviations: (checked)
- Construct a table of classifiers vs. datasets, and in each entry,
enter the accuracy and standard deviation of that classifier on that dataset
from the above experiment. Also, add an asterisk to the end of the entry
for any dataset/classifier pair for which the classifier outperforms
NaiveBayes at the 0.05 level.
- Compare the performance of the different classifiers on the different
datasets. Specifically, which classifier performs better on which datasets and
why. The "why" part should consider the characteristics of the data, the
hypothesis space, and the learning algorithm.
- Email to me (email@example.com) a
nicely-formatted document containing your answers to the above questions.