Machine Learning

Class Project

Project due December 12, 2007 (midnight)

No late submissions will be accepted.

Intermediate deadlines:

Team Registration: October 19, 2007

Initial Entry: November 2, 2007

For the class project, you will form 1-2 person teams to compete in the Netflix Prize, a machine learning challenge to predict how Netflix users will rate movies. The best entries each year can win $50,000, and anyone achieving their target performance increase of 10% over their current approach will win $1,000,000. Of course, the main goal is for you to learn more about applying machine learning techniques to real problems. Below are the specific requirements for the class project.

  1. Read over the material at, especially the rules and the frequently asked questions.
  2. You may choose to compete individually or as a two-person team. Some portion of the grading will be based on the difficulty of your approach and your team's ranking within the class, so I recommend you pair up. If you need help finding a teammate, let me know. Once you have your team finalized, you should follow the instructions on the website to register your team. By October 19 provide me with your team name, team members and password. I need this information in order to monitor your progress and the class rankings by accessing the performance of your entries maintained at the Netflix Prize website.
  3. Next, you will need to download the data, which is about 700MB compressed and about 2GB uncompressed. Let me know if you need help storing the data. More information about the data is available in the README file contained in the download. You should read this carefully.
  4. For your first entry, each team should submit the same entry; namely, predict 3.8 (the global average rating) for every customer-id/date for every movie. This should result in an RMSE score of 1.1357. By November 2 provide me with this prediction file that has been successfully submitted to the Netflix Prize site.
  5. The remainder of your effort on the project should involve designing, implementing and testing one or more machine learning approaches to achieve better predictions for the Netflix Prize data. Note the following:
    1. We will be maintaining an up-to-date ranking of the teams based on their submissions to the Netflix Prize. So, if you make a submission that improves your current best RMSE score, let me or the TA know so that we can update the class ranking.
    2. Be aware that the Netflix Prize only allows one submission per team, per day, so you will need to make steady progress on this project. You will not be able to perform many last minute submissions.
  6. By December 12 you should email to me ( the following:
    1. Report describing all your attempts (at least for those that show up under your NetFlix Prize team information), the methods used for each, enough detail on your best submission so that the results can be reproduced, and a general discussion of your experience (what worked, what didn't, why, and what would you try next).
    2. All code and instructions necessary for reproducing your best result from the training data. That is, we will need to be able to input the training data to your software and get out your best prediction file. You can assume we have Weka, but there is no requirement that you use Weka.
    3. Your best prediction file successfully submitted to the Netflix Prize site.
    4. For 2-person teams, each team member should send me a separate email describing each team member's contribution to the project. These emails will be considered confidential between me and you.
  7. Your project will be graded according to the following criteria.
    1. The difficulty, number and creativity of your successful submissions to the Netflix Prize.
    2. The relevance to machine learning of your approach(es) to the problem.
    3. Your team's ranking within the class based on the RMSE score of your best successful Netflix Prize submission.
    4. The relative contribution of each team member.
    5. Your meeting the above intermediate deadlines.
    6. The quality of your report based on presentation, coverage, detail and general discussion.
    7. The efficiency, understandability and correctness of your instructions and code for reproducing your best result.

Team Rankings

Rank Team Name Best RMSE
1 MLSurvivors 0.9260
1 AndyYan 0.9260
3 greenteam 0.9852
4 fedex 1.0212
5 MaLeRT 1.0533
6 LBYD 1.1357
6 MonkeyIslandResearchTeam 1.1357
Last updated: December 12, 2007 at 11:59pm