Log in

No account? Create an account
Combinatorial explosion of branches - Beware of the Train — LiveJournal [entries|archive|friends|userinfo]

[ website | My Website ]
[ userinfo | livejournal userinfo ]
[ archive | journal archive ]

[Links:| My moblog Hypothetical, the place to be My (fairly feeble) website ]

Combinatorial explosion of branches [Dec. 6th, 2012|11:41 pm]
[Tags|, , , , , , ]

I've been running benchmarks again. The basic workflow is

  1. Create some number of directories containing the benchmark suites I want to run.
  2. Tweak the Makefiles so benchmarks are compiled and run with the compilers, simulators, libraries, flags, etc, that I care about.
  3. Optionally tweak the source code to (for instance) change the number of iterations the benchmarks are run for.
  4. Run the benchmarks!
  5. Check the output; discover that something is broken.
  6. Swear, fix the problem.
  7. Repeat until either you have enough data or the conference submission deadline gets too close and you are forced to reduce the scope of your experiments.
  8. Collate the outputs from the successful runs, and analyse them.
  9. Make encouraging noises as the graduate students do the hard work of actually writing the paper.

Suppose I want to benchmark three different simulators with two different compilers for three iteration counts. That's 18 configurations. Now note that the problem found in stage 5 and fixed in stage 6 will probably not be unique to one configuration - if it affects the invocation of one of the compilers then I'll want to propagate that change to nine configurations, for instance. If it affects the benchmarks themselves or the benchmark-invocation harness, it will need to be propagated to all of them. Sounds like this is a job for version control, right? And, of course, I've been using version control to help me with this; immediately after step 1 I check everything into Git, and then use git fetch and git merge to move changes between repositories. But this is still unpleasantly tedious and manual. For my last paper, I was comparing two different simulators with three iteration counts, and I organised this into three checkouts (x1, x10, x100), each with two branches (simulator1, simulator2). If I discovered a problem affecting simulator1, I'd fix it in, say, x1's simulator1 branch, then git pull the change into x10 and x100. When I discovered a problem affecting every configuration, I checked out the root commit of x1, fixed the bug in a new branch, then git merged that branch with the simulator1 and simulator2 branches, then git pulled those merges into x10 and x100.

Keeping track of what I'd done and what I needed to do was frankly too cognitively demanding, and I was constantly bedevilled by the sense that there had to be a Better Way. I asked about this on Twitter, and Ganesh Sittampalam suggested "use Darcs" - and you know, I think he's right, Darcs' "bag of commuting patches" model is a better fit to what I'm trying to do than Git's "DAG of snapshots" model. The obvious way to handle this in Darcs would be to have six base repositories, called "everything", "x1", "x10", "x100", "simulator1" and "simulator2"; and six working repositories, called "simulator2_x1", "simulator2_x10", "simulator2_x100", "simulator2_x1", "simulator2_x10" and "simulator2_x100". Then set up update scripts in each working repository, containing, for instance

darcs pull ../base/everything
darcs pull ../base/simulator1
darcs pull ../base/x10
and every time you fix a bug, run for i in working/*; do $i/update; done.

But! It is extremely useful to be able to commit the output logs associated with a particular state of the build scripts, so you can say "wait, what went wrong when I used the -static flag? Oh yeah, that". I don't think Darcs handles that very well - or at least, it's not easy to retrieve any particular state of a Darcs repo. Git is great for that, but whenever I think about duplicating the setup described above in Git my mind recoils in horror before I can think through the details. Perhaps it shouldn't - would this work? Is there a Better Way that I'm not seeing?


[User Picture]From: fanf
2012-12-07 01:01 am (UTC)
Is version control really the right tool? Shouldn't there be just one version which is parametrized according to the experiments you want to run? Then you only need to fix each bug once. It should also make the setup more easily reproducible so you could hand it to a grad student to run the numbers on a compute cluster or something. I often find that turning a repetitive problem into a programming problem makes it more rewarding.
(Reply) (Thread)
[User Picture]From: pozorvlak
2012-12-08 11:55 am (UTC)
Hmmmm, you could be right. In EEMBC 1.1, the iteration counts are specified in each benchmark using C #defines, and I altered these by editing the code (well, with perl -i, but the principle is the same). But I guess I could have changed
#define ITERATIONS 50000
and then added -DMULTIPLIER=100 to CFLAGS.

you could hand it to a grad student to run the numbers on a compute cluster or something.

In this particular case, we cared about wallclock time, and I'd be worried about introducing variability if we ran it on a cluster - perhaps unreasonably, and I suppose we could do multiple runs and average. Actually, we should probably be doing that anyway. But yeah, good idea.
(Reply) (Parent) (Thread)