Idea: Now I'm focusing on improving the time to insert the iTunes data into database. Where we left off last time, our script took 71 seconds to run, ~50 seconds of which was database operations. The idea I had to speed this up was to batch a bunch of queries together and thus make fewer calls to the database. It turns out this actually slowed things down.
So I did a little research and it turns out if you insert data with the same query structure over and over again (but with different bind variables), the database doesn't have to reparse the query which speeds things up a lot. I tried doing this with pyPgSql but couldn't find any documentation how it was supposed to work, so I switched to using psycopg2 and changed the query for inserting the playlist data. Just switching to a psycopg2 sped things up a lot, it seems. I tried switching to a similar sort of query for inserting track data, but that actually slowed things down.
Anyway, the new script runs in 25 seconds, and it looks like only around 9 seconds for database operations. This is a 400% speedup in the database time! Overall, this step improved performance by ~180%, and since we started at 114 seconds we've improved ~350%.
Conclusion: Another big success, and I'm not sure how much more I can squeeze out of the iTunesInfo.py script. Next time I'll focus on the analyzedb.py script, which does the analysis from the database - right now it's taking between 5 and 12 minutes to run on my library of 6400 tracks.
- old script
- new script