||[Feb. 24th, 2009|10:20 am]
So I wrote this benchmark analyzer to analyze benchmarks that have many parameters and try to figure out what the important ones are. To that end, it creates a decision tree in R, and also generates lots of boxplots to visualize the different parameters. Here's a sample report (with boring data). It also (somewhat cleverly) saves the original .csv file with the web page it generates, so you can save the page and send it to a colleague who can play with the options and rerun the analysis.
It occurred to me as I was prettying this up that most benchmarks don't involve lots of parameters, and so the probability that anyone else is going to find this useful is pretty low. I'm OK with that because
- it's useful to me right this very moment for work stuff
- I had fun writing it and learned some more about R
- pretty graphs!
- it was good to work on something other than whereslunch.org for a while
I feel like I should understand what this is doing, but I don't. Can you give an example about how you would use this?
Sure! Let's say you work on software that has a function F. F takes lots of different parameters that are more or less orthogonal, and you have benchmarks of how well F performs with lots of combinations of these parameters.
Now you want to make some optimizations to F and you want to see how it affects the benchmarks, but you have 500 different combinations of parameters, and for some combinations it got a lot better, for some it got a little better, and for some it got a little worse. What the analyzer does is split up the benchmarks by parameter, and so you can see if for parameter X it got a lot faster but for parameter Y it got a bit slower, and so you should look at what it's doing for parameter Y and try to fix it.
Ah, interesting. BTW, I'm getting a 404 when I click Reanalyze in your sample:
The requested URL /~gregstoll/benchmarkanalysis/doanalysis.cgi was not found on this server.
Dang it, I knew I should have regenerated that instead of manually editing the HTML. Fixed. (and thanks!)
Got another one: if I change the p-value in the options to 0.95, the decision tree still shows p < 0.001. Or am I misunderstanding?
The p-value represents the certainty needed to do a split. Showing p < 0.001 is correct - if you bump p up to .9999 or something then it should disappear.