On or before the above due date you should turn in (or email) the title of the paper(s) for your class presentation. Also, include date preferences for your presentation (November 16, 21, 28 or 30). Presentations will last approximately 25 minutes. Since you will be graded on (among other things) your coverage of the material in the paper, I strongly recommend you use PowerPoint or overhead transparencies for your presentation to expedite the delivery of information (I can provide you with blank transparencies).
As described below, the theme of your project is to learn in the presence of massive amounts of heterogeneous data, and I would like your presentation paper choice(s) to be related to this topic. Good places to look for papers are recent conference proceedings from the International Conference on Machine Learning, the International Conference on Knowledge Discovery and Data Mining, the National Conference on Artificial Intelligence, and the International Joint Conference on Artificial Intelligence. Another possible source for papers is the reference sections of Mitchell's book. Paper selections must be approved by me.
On or before the above proposal due date, you should turn in (email) a proposal of your machine learning class project. The class project will be more focussed than in previous years, specifically related to learning in the presence of massive amounts of heterogeneous data. Mr. Istvan Jonyer has written a simple, Windows-based tic-tac-toe game which captures video, mouse and keyboard inputs from the user during the course of play. The goal of your project is to learn patterns from this data. Specifically, your project will use one or more learning methods (either discussed in class or described in the research literature), some subset (or all) of the data, one or more representations of the data, one or more representations for learned patterns, and one or more criteria for successful learning, to learn patterns from the data. Your project proposal should include a description of your preliminary choices for each of these aspects of the learning task, as well as expected results.
The project will consist of a writeup describing, in more detail, the learning task, including the problem, relevance to machine learning, approaches, analyses, results, and conclusions about the advantages and disadvantages of your approach. You should also turn in copies of all supporting materials, including code written by you. If your code is a modification of existing code, be sure to clearly indicate your modifications. The project is a significant part of this course, and I expect you to spend a significant amount of time preparing your final report. Therefore, I encourage you to turn in your project proposal as soon as possible and get started early.
The tic-tac-toe game is a simple, Windows-based game written by Mr. Istvan Jonyer that allows the user to play the game at one of three difficulty levels: easy, medium, hard. The game also has the option of capturing various kinds of data (video, mouse, keyboard, game log) at various frequencies and resolutions. Each piece of data captured is time stamped and written to a file.
The data stream is saved in 4 files for video, mouse, keyboard and game data. These files are given the extensions .video, .mouse, .kb and .game. Samples of these four files are available in the sample.zip file in the data/tictactoe directory on gamma2.
Each data instance saved in all of these files is time-stamped using the following format:
T <seconds> <milliseconds>
T 972164413 348
At creation time, each file is time-stamped.
The video file is a modified version of the PNM file format. This version features the above mentioned time stamp, and it may contain more than one image. The image format is as follows:
T 972164415 634 Time stamp P3 PNM version flag # Creator Comment 160 120 Resolution (x, y) 255 Range of color values (0..255) 43 Red component of pixel (0,0) 33 Green component of pixel (0,0) 140 Blue component of pixel (0,0) 45 Red component of pixel (1,0)
The mouse file describes mouse events such as button pressed or released, mouse moved and window resized. The format is as follows:
L D x y Left button down at (x,y) L U x y Left button up at (x,y) R D x y Right button down at (x,y) R U x y Right button up at (x,y) M x y Mouse moved to position (x,y) W R w h Window resized to w width and h hight
The keyboard file describes keys pressed in the following format:
K a Key a pressed K space The space key was pressed
The following control keys (ASCII value < 32) are also saved:
K enter Enter key K tab Tab key K esc Esc key K del Backspace key
The game file describes the flow of the game in the following format:
N New game O r c O picked box at row r and column c X r c X picked box at row r and column c W X X won W O O won W T Tie (nobody won) P X Human player is X's P O Human player is O's D easy New difficulty level is easy D medium New difficulty level is medium D hard New difficulty level is hard