Machine Learning

Homework 1

Due: September 3, 2010 (midnight)

For this assignment you will familiarize yourself with the WEKA Machine Learning Software, which we will use throughout the course to test various learning algorithms. You will also analyze a particular learning problem.

  1. Download and install WEKA on your preferred platform. WEKA is available here. Be sure to get the latest Developer version (3.7.2). WEKA is already installed on the machines in Sloan 353.
  2. First, we will analyze the contact-lenses dataset provided with WEKA (also available here).
    1. Run WEKA and select the Explorer application.
    2. Under the Preprocess tab, select "Open file..." and open the contact-lenses dataset. You should now see information about this dataset.
    3. For each feature used to describe an instance, give the name of the feature, its type, and the number of possible values.
    4. Give the size of the instance space. Justify your answer.
    5. What is the class feature and its possible values?
    6. Under the "Classify" tab, click on "Choose" and select the OneR classifier. OneR chooses one feature and attempts to use it for classification. Also, click on "OneR" to bring up its parameter list and change "minBucketSize" to 1.
    7. Under "Test options" select "Use training set".
    8. Click "Start" to run and retain the output.
    9. Which feature did OneR select for its classifier?
    10. How many errors did this classifier make on the training set?
  3. Second, we will analyze the weather dataset provided with WEKA (also available here). Perform the same steps and answer the same questions/requests for this dataset as you did in the previous problem.
  4. Finally, we will analyze a dataset based on the "family car" example in class.
    1. Create a dataset in the ARFF format used by WEKA, with two numeric features ("price" and "engine-power") and one discrete class feature ("family-car") whose values can be "yes" or "no". The actual data is shown below.
      Price Engine Power Family Car
      7000 310 no
      8000 180 no
      14000 200 no
      15000 280 yes
      20000 250 yes
      20000 340 no
      21000 190 no
      22000 300 yes
      25000 260 yes
      27000 285 yes
      29000 340 no
      30000 210 no
      39000 160 no
      40000 245 no
      41000 285 no
    2. Perform the same WEKA test as before: Classifier "OneR -B 1", Test option "Use training set". Retain the output.
    3. Which feature did OneR select for its classifier?
    4. List the examples that this classifier incorrectly classified (if any).
  5. Email to me ( one ZIP file containing the following.