CSE 6363 Fall 2000
Due: October 10, 2000, 7:00pm (October 12, 2000, 7:00pm for -10%)
- In the directory 6363-501/data/images will be 36 30x30
greyscale (one byte per pixel) face images of myself and the students (3
images each). Twenty-four of the images will be in the subdirectory train and the remaining twelve will be in the subdirectory test. Each
file is in pnm format and named after the individual's last name (e.g., holder1.pnm). Your job is to use the BP program to train a network to
recognize the faces in the class. The BP program, documentation and
examples are in the directory 6363-501/code/bp.
- You should be able to display the images with any image viewer
program. There is already a sample 30x30 image of Tom Mitchell stored in
images/mitchell.pnm. Note, this image is not to be used in training
- Your network will have 900 inputs, some number of hidden units, and
12 outputs. Your 12 outputs can be either 0 or 1 according to which person
is recognized. Use the order: Adcock, Baritchi, Bean, Butler, Forteza,
Han, Holder, Huang, Lin, Panya, Sandanayake, Youngblood. For example, the
target output pattern for Holder would be (0,0,0,0,0,0,0,1,0,0,0,0,0). Or,
if you decide to follow the advice of the book (see Section 4.7), you may
want to use output patterns like
(0.1,0.1,0.1,0.1,0.1,0.1,0.9,0.1,0.1,0.1,0.1,0.1) to avoid arbitrarily
large weights since the output units can never attain exactly 0 or 1.
- You will need to convert the pnm files into input files for the BP
program. The pnm file is in the following format:
# ... (comment)
The first line ``P2'' specifies the file is in pnm ascii format. Any line
beginning with a ``#'' is a comment. The second uncommented line contains
the dimensions of the image, and the third line specifies the maximum of
the pixel value range (0-255). The remaining lines contain the pixel
values. The first value is the pixel in the top-left corner of the image.
The next value is the next pixel in the top row of the image, and so on.
Per the book's suggestion (see Section 4.7) you may want to normalize
the input values to between 0 and 1.
- Train your network on the 24 training images and test the network on
ALL 36 images. Try different parameters and topologies to minimize testing
- Turn in the network file of your best network, and any other
information necessary for me to reproduce your best result. Also, discuss
your experience with this task (e.g., what worked, what didn't work, and
the effectiveness of neural nets on this task).
- For each of the six datasets in the 6363-501/data directory,
use the ml program to run a 10-fold cross validation on the Bayes,
C4.5 and BP algorithms. A file-based interface to the BP program has been
provided in 6363-501/code/ml2.0/bp.c. You are encouraged to modify
the interface to BP in order to improve performance. Code for a naive
Bayes classifier is provided in 6363-501/code/ml2.0/bayes.c. Compile
your results into three tables each in the form shown below. For example,
the BP-C4.5 column is the average and standard deviation of the difference
between the 10 runs of the two algorithms. Also include the significance
levels of the differences and the overall ANOVA significance for all three
||BP - C4.5
||0.33 +/- 0.05
||0.22 +/- 0.07
||0.11 +/- 0.06
||. . .
- Compare the different algorithms based on your tabulated results
(i.e., which algorithm seems best). Describe any modifications to the BP