| more netflixing |
[Oct. 5th, 2006|09:35 am]
|
So I submitted my first entry last night - it did poorly, of course, but at least I'm on the Leaderboard! (I'm "teamgreg" because I really didn't want to think of a name) Crunched some more numbers last night in support of my next submission, which I have to wait a week for. It's a little annoying that just reading in all 100 million entries takes, at a minimum, 40 minutes (not to mention any processing, but that's all been pretty quick so far), although it works decently if we're doing other things that I can start a run of that and then get back to something else. As long as that "something else" isn't WoW, because it really sloooows my computer down. :-)
I'm impressed one team already has a RMSE (root mean squared error) of .9571, which is within spitting distance of .9474, which is how Netflix's algorithm does on the data. |
|
|
| Comments: |
FYI Gary and I were talking about this last night and were discussing pulling zip code/census data (freely available stuff, IIRC) into it. Does that make sense from what you get to work with?
Neat idea! Unfortunately, the data set doesn't include any information about users (only a unique identifier). For the movies it includes the title and the date it was either released or released on DVD (so using the date is a bit sketchy), so you can pull external data about that...
Suck... if you get just a zipcode, you could harvest/infer all sorts of good info about people (with minimal privacy concerns). I wonder if my father is too busy on stuff to work on this - he's been doing various datamining whatnot for years... Somehow I suspect that he's too busy in retirement to want to do more "work" :) Keep letting us know how it's going! | |