||[Oct. 24th, 2006|08:59 am]
So, my best result on the netflix prize scored an RMSE of .981 on the probe set (this was using the movie correlation). My attempt to use user-based correlation resulted in a hideous 1.03 RMSE, which is still better than just taking the average of each movie and using that rating for each user, but not by much. So there is a little data in there, and this morning when I should have been showering I cooked up a little script to take a weighted average of the two results (obviously weighted towards the movie correlation one since it scored much better). The outputs were not horribly encouraging - I managed to get the RMSE down to .976 or so, but that's it. The last person on the leaderboard has an RMSE of .9597 right now, so I'm way off...
I do have one more way of calculating user correlations that is running now, but assuming that doesn't yield fabulous results I'll probably give up and reclaim my computer within the week. It's a little disappointing for it to end this way, but I did give it a good shot and even made it on the leaderboard for a short amount of time. And I had fun, and kept up my C++ skills a little. So it wasn't a waste!
Also, I've read that the movie correlation data is kinda interesting (which movie did people like around as much as Miss Congeniality), so I'll probably cook up a little script to show that data somehow. That would be fun.
I read a paper that suggests including data from IMDB about the movies (actors, directors, etc.) could improve things, but there are 17770 of them and only one of me, and I'm short on free time as it is.