Homework #2

This homework is to be completed on your own, without input, code, or assistance from other students. See me or the TA if you have questions.


1. Implement Agrawal's Apriori algorithm for finding association rules. Use a minimum support value of 6 (at least 6 instances of each candidate itemset must appear in the database) and a minimum confidence value of 0.6. Test your algorithm on the Automobile database. Each transaction in this database contains eight fields, represented as integer values: model year, cylinders, weight, mpg, origin, horsepower, displacement, and acceleration.

Print each learned rule in the following format (you may not necessarily learn this particular rule):
RULE: origin=13 and mpg=73 => acceleration=1 and displacement=8 (confidence = 1.0, support = 7)
As a note, the attributes appear in the following order in the database: (origin, cylinders, mpg, acceleration, model year, displacement, horsepower, and weight).

Turn in your well-documented code with the contents of a sample run.


Here are some implementations of the Apriori algorithm.


2. For this problem, you may work in groups of 1 or 2. You will need to download and install an evaluation copy of DBMiner onto a Windows95 or WindowsNT platform. Create a cube using the "US Population" table, and answer the questions below using the cube. Because this table is too big for the evaluation version of DBMiner, you will need to remove enough entries from the END of the file to fit in the allowed space (a maximum of 1,000 entries). The cube should contain three dimensions: 1) pop (created from "pop90" and "pop80"), 2) area (created from "area"), and 3) pop_per_sqm92 (created from "pop_per_sqm92").