|A Most Vexatious Merge
||[Mar. 30th, 2011|10:57 pm]
In The Art of Unix Programming, Eric Raymond lists among his basics of the Unix philosophy the "Rule of Generation":
14. Rule of Generation: Avoid hand-hacking; write programs to write programs when you can. He goes into this idea in more detail in chapter 9 of the same book.
I used to believe this was a good idea, and in many situations (here's a great example) it is. But my current work project, which makes heavy use of code generators and custom minilanguages, has been a crash course (sometimes literally) in the downsides. Here's the latest example.
I've recently been merging in some code a colleague wrote about a year ago, just before I started. As you'd expect, with a year's drift this was a non-trivial exercise, but I eventually got all the diffs applied in (I thought) the right places. Protip: if forced to interact with a Subversion repository, use git as your client. It makes your life so much less unpleasant. Anyway, I finished the textual part of the merge, and compiled the code.
Screens-full of error messages. Oh well, that's not so unexpected.
I'm a big fan of Tilton's Law: "solve the first problem". The chances are good that the subsequent problems are just cascading damage from the first problem; no sense in worrying about them until you've fixed that one. Accordingly, I looked only at the first message: "The variable 'state' has not been declared at line 273".
Git checkout colleagues-branch. Make. No errors.
Git checkout merge-branch. Make. Screens-full of errors.
Git checkout colleagues-branch. Grep for a declaration of "state". None visible.
Clearly, there was some piece of voodoo that I'd failed to merge correctly.
I spent days looking through diffs for something, anything, that I'd failed to merge properly that might be relevant to this problem. I failed.
I then spent some serious quality time with the code-generation framework's thousand-page manual, looking for some implicit-declaration mechanism that might explain why "state" was visible in my colleague's branch, but not in mine. I failed.
Finally, I did what I probably should have done in the first place, and took a closer look at the generated code. The error messages that I was seeing referred to the DSL source code rather than the generated C code, because the code-generator emitted #line directives to reset the C compiler's idea of the current file and line; I could therefore find the relevant section of generated code by grepping for the name of the buggy source file in the
The framework uses code generators for all sorts of things (my favourite generator being the shell script that interprets a DSL to build a Makefile which is used to build another Makefile), but this particular one was used to implement a form of polymorphism: the C snippet you provide is pasted into a honking great
switch statement, which switches on some kind of type tag.
I found the relevant bit of generated code, and searched back to the beginning of the function. Yep, "state" was indeed undeclared in that function. And the code generator had left a helpful comment to tell me which hook I needed to use to declare variables or do other setup at the beginning of the function. So that was the thing I'd failed to merge properly!
Git checkout colleagues-branch. Grep for the hook. No results.
And then it hit me.
Like all nontrivial compilers, ours works by making several transformation passes over the code. The first pass parses your textual source-code and spits out a machine-independent tree-structured intermediate representation (IR). There then follow various optimization and analysis passes, which take in IR and return IR. Then the IR is expanded into a machine-specific low-level IR, and finally the low-level IR is emitted as assembly language.
The code that was refusing to compile was part of the expansion stage. But at the time that code was written, the expansion stage didn't exist: we went straight from the high-level IR to assembly. Adding an expansion stage had been my first task on being hired. Had we been using a language that supported polymorphism natively, that wouldn't have been a problem: the code would have been compiled anyway, and the errors would have been spotted; a smart enough compiler would have pointed out that the function was never called. But because we were using a two-stage generate-and-compile build process, we were in trouble. Because there was no expansion stage in my colleague's branch, the broken code was never pasted into a C file, and hence never compiled. My colleague's code was, in fact, full of compile-time errors, but appeared not to be, because the C compiler never got a look at it.
And then I took a closer look at the screensfull of error messages, and saw that I could have worked that out right at the beginning: subsequent error messages referred to
OUTFILE, and the output file isn't even open at the expansion stage. Clearly, the code had originally been written to run in the emit phase (when both
OUTFILE were live), and he'd got half-way through converting it to run at expansion-time before having to abandon it.
- In a generated-code scenario, do not assume that any particular snippet has been compiled successfully just because the whole codebase builds without errors.
- Prefer languages with decent native abstraction mechanisms to code generators.
- At least skim the subsequent error messages before dismissing them and working on the first bug: they may provide useful context.
- Communication: if I'd enquired more carefully about the condition of the code to be merged I could have saved myself a lot of time.
- Bear in mind the possibility that you might not be the guilty one.
- Treat ESR's pronouncements with even greater caution in future. Same goes for Kenny Tilton, or any other Great Prognosticator.
Edit: two more:
Thanks to gareth_rees and jerf.
- If, despite (2), you find yourself writing a snippet-pasting code generator, give serious thought to providing "this snippet is unused" warnings.
- Learn to spot when you're engaged in fruitless activity and need to step back and form a better plan. In my case, the time crawling through diffs was wasted, and I probably could have solved the problem much quicker if I'd rolled up my sleeves and tried to understand the actual code.