Project due December 16, 2009
No late submissions will be accepted.
Team Registration: November 6, 2009
Initial Entry: November 13, 2009
For the class project, you will form 1-2 person teams to participate in the
2009 UC San Diego Data Mining Contest, a machine
learning challenge to predict anomalies in e-commerce transactions. The contest
deadline has passed, so we will not be competing for the $4,000 prize money, but
the contest server is still running. Besides, it's not about the money, right?
The main goal is for you to learn more
about applying machine learning techniques to real problems.
Below are the specific requirements for the class project.
- Read over the material at mill.ucsd.edu.
We will be attempting the "Hard" version.
- You may choose to compete individually or as a two-person team.
Some portion of the grading will be based on the difficulty of your
approach and your team's ranking within the class, so I recommend
you pair up. If you need help finding a teammate, let me know. Once
you have your team finalized, you should follow the instructions on
the website to register your team by November 6 and provide
me with your team name and team members.
- For your first entry, each team should submit the same entry;
namely, predict 0.020000 (probability of positive based on class
distribution) for each of the 50,000 test instances.
This should result in a lift score of 0.894. This first entry should
be completed by November 13. Send me an email when your first entry
appears on the leaderboard.
- The remainder of your effort on the project should involve designing,
implementing and testing one or more machine learning approaches to
achieve better performance on this challenge problem.
We will be maintaining an up-to-date ranking of the teams based on
their submissions to the challenge. So, if you make a submission that
improves your current best score, let me know by email so that we
can update the class ranking.
- By December 16 you should email to me
(email@example.com) the following:
- Report describing all your attempts, the methods used for each,
enough detail on your best submission so that the results can be reproduced,
and a general discussion of your experience (what worked, what didn't,
why, and what would you try next).
- All code and instructions necessary for reproducing your best result
from the training data. That is, we will need to be able to input the
training data to your software and get out your best prediction file. You
can assume we have Weka, but there is no requirement that you use Weka.
- Your best prediction file successfully submitted to the challenge site.
- For 2-person teams, each team member should send me a
separate email describing each team member's contribution
to the project. These emails will be considered confidential between me
- Your project will be graded according to the following criteria.
- The difficulty, number and creativity of your submissions to the challenge.
- The relevance to machine learning of your approach(es) to the problem.
- Your team's ranking within the class based on the score of your
best successful submission.
- The relative contribution of each team member.
- Your meeting the above intermediate deadlines.
- The quality of your report based on presentation, coverage, detail
and general discussion.
- The efficiency, understandability and correctness of your instructions
and code for reproducing your best result.
Last updated: December 17, 2009 at 12:20am
|Rank ||Team Name ||Best Score |
|1 ||dragon1wsu ||3.539 |
|2 ||Albion ||3.503 |
|3 ||noSkynetHere ||3.448 |
|4 ||Seth ||3.372 |
|5 ||RAB ||3.325 |
|6 ||Dang_Nabbit ||3.293 |
|7 ||Learner ||3.285 |
|8 ||Halo ||3.230 |
|9 ||RobenGokcen ||3.174|| |