more netflixing [Oct. 5th, 2006|09:35 am]
[Current Mood |accomplishedaccomplished]

So I submitted my first entry last night - it did poorly, of course, but at least I'm on the Leaderboard! (I'm "teamgreg" because I really didn't want to think of a name) Crunched some more numbers last night in support of my next submission, which I have to wait a week for. It's a little annoying that just reading in all 100 million entries takes, at a minimum, 40 minutes (not to mention any processing, but that's all been pretty quick so far), although it works decently if we're doing other things that I can start a run of that and then get back to something else. As long as that "something else" isn't WoW, because it really sloooows my computer down. :-)

I'm impressed one team already has a RMSE (root mean squared error) of .9571, which is within spitting distance of .9474, which is how Netflix's algorithm does on the data.

[User Picture]From: onefishclappin
2006-10-05 02:23 pm (UTC)
FYI Gary and I were talking about this last night and were discussing pulling zip code/census data (freely available stuff, IIRC) into it. Does that make sense from what you get to work with?
(Reply) (Thread)
[User Picture]From: gregstoll
2006-10-05 03:13 pm (UTC)
Neat idea! Unfortunately, the data set doesn't include any information about users (only a unique identifier). For the movies it includes the title and the date it was either released or released on DVD (so using the date is a bit sketchy), so you can pull external data about that...
(Reply) (Parent) (Thread)
[User Picture]From: onefishclappin
2006-10-05 04:09 pm (UTC)
Suck... if you get just a zipcode, you could harvest/infer all sorts of good info about people (with minimal privacy concerns).
I wonder if my father is too busy on stuff to work on this - he's been doing various datamining whatnot for years... Somehow I suspect that he's too busy in retirement to want to do more "work" :)
Keep letting us know how it's going!
(Reply) (Parent) (Thread)