Greg - more netflixing [entries|archive|friends|userinfo]
Greg

[ website | gregstoll.com ]
[ userinfo | livejournal userinfo ]
[ archive | journal archive ]

Links
[Links:| * Homepage * Mobile apps (Windows Phone, Win8, Android, webOS) * Pictures * LJBackup * Same-sex marriage map * iTunesAnalysis * Where's lunch? ]

more netflixing [Oct. 5th, 2006|09:35 am]
Previous Entry Share Next Entry
[Tags|]
[Current Mood |accomplishedaccomplished]

So I submitted my first entry last night - it did poorly, of course, but at least I'm on the Leaderboard! (I'm "teamgreg" because I really didn't want to think of a name) Crunched some more numbers last night in support of my next submission, which I have to wait a week for. It's a little annoying that just reading in all 100 million entries takes, at a minimum, 40 minutes (not to mention any processing, but that's all been pretty quick so far), although it works decently if we're doing other things that I can start a run of that and then get back to something else. As long as that "something else" isn't WoW, because it really sloooows my computer down. :-)

I'm impressed one team already has a RMSE (root mean squared error) of .9571, which is within spitting distance of .9474, which is how Netflix's algorithm does on the data.
LinkReply

Comments:
[User Picture]From: onefishclappin
2006-10-05 02:23 pm (UTC)

(Link)

FYI Gary and I were talking about this last night and were discussing pulling zip code/census data (freely available stuff, IIRC) into it. Does that make sense from what you get to work with?
[User Picture]From: gregstoll
2006-10-05 03:13 pm (UTC)

(Link)

Neat idea! Unfortunately, the data set doesn't include any information about users (only a unique identifier). For the movies it includes the title and the date it was either released or released on DVD (so using the date is a bit sketchy), so you can pull external data about that...
[User Picture]From: onefishclappin
2006-10-05 04:09 pm (UTC)

(Link)

Suck... if you get just a zipcode, you could harvest/infer all sorts of good info about people (with minimal privacy concerns).
I wonder if my father is too busy on stuff to work on this - he's been doing various datamining whatnot for years... Somehow I suspect that he's too busy in retirement to want to do more "work" :)
Keep letting us know how it's going!