I sing a song of bugs and of geeks.
The main focus of my research group is the design of a microprocessor that uses very little power, called EnCore. I am working on the compiler for EnCore: the program that takes in programs written in a human-writable language called C, and then spits out semantically-equivalent (but hopefully more efficient) programs written in the opaque binary code that is the EnCore chip's native language. Compilers aren't absolutely necessary - the earliest computers were programmed directly in machine code - but they make programming so much easier that nobody would think of shipping a new chip without a basic compilation toolchain these days.
The way you test a compiler is by asking it to compile programs whose behaviour is well-understood, and then running the binaries it produces and comparing their behaviour to what you expected. I'd found a test program that was being compiled incorrectly, indicating a bug in the compiler; better yet, it was compiled correctly by a previous version of the compiler, so the bug had been introduced recently.
Our compiler, like most modern C compilers, consists of several programs that run one after the other:
- The preprocessor, which takes in files of C source code (with names ending in
.c), performs various textual substitutions and transformations on them, and puts the output in files ending in .i. C programmers use this facility to do things like giving names to commonly-used constants, or to configure their code so it can be compiled for different operating systems or architectures. The existence of the preprocessor is increasingly admitted to be a mistake in the design of C (or rather, there are better and safer ways of providing the features it offers), but hey, we're stuck with it now. - The compiler proper (in our case called
acc), which takes in .i files and produces assembly language, a textual representation of the actual instructions that the computer's processor will execute. Assembly language (in files ending in .s) is useful because it faithfully and precisely represents what will actually be executed, but it's still (just about) human-readable and can still be manipulated with standard text-oriented programs. As we will see, this is a Good Thing. - The assembler, which takes
.s files and translates them, one instruction at a time, into the binary code that the computer's processor actually understands. The results are put in object files, with the suffix .o. - The linker, which takes one or more
.o files and combines them together into a single executable (which can be called anything, but which is called a.out by default), ensuring that cross-references match up properly and that the overall layout is what the operating system expects. The first linker was forged by Satan himself in the eternal fires of Hell, and subsequent iterations have only increased the infernality of the basic design; we shall discuss linkers no further. Invoking these stages in the right order and with the right settings is taken care of by a driver program (in our case called ecc). So, if your project consists of two C code files called fred.c and wilma.c, then the command ecc fred.c wilma.c will first invoke the preprocessor to produce fred.i and wilma.i, then acc to produce fred.s and wilma.s, then the assembler to produce fred.o and wilma.o, then finally the linker to produce a.out.
My next move was obvious: compile the test program to assembly language using the old (working) and new (broken) versions of the compiler, and look at the differences between the two .s files - the textual nature of assembly language meant that I could use a standard tool called diff for comparing two similar text files. I duly did this, and found only one difference: in a few places, the old version was referring to one of the processor's registers as %r3 and the new version was referring to it as just r3. [A register is a unit of scratch memory in the processor, for holding values to be calculated with. Registers can be accessed much faster than main memory, but typically there's only a handful of them. One of the tricky parts of compiler construction is making efficient use of the processor's registers]. I didn't know what difference that made, but theorised that the % sign was doing some dereferencing: maybe a plain r3 was interpreted as a pointer to some other register, or something.
Shameful confession: I don't actually speak EnCore assembly language very well. Yet.
Anyway, I found the section of code where this change had been made, asked our version control software who was responsible for it, and discovered that it had been written by an undergraduate as part of his summer project. Case closed, I thought. I fired off a snotty email to my colleague Igor for signing off on obviously untested code, reverted that particular change, kicked off the full test suite, and went home.
I arrived the next morning to discover an email from Igor (who does speak EnCore assembly language properly) saying that the change I'd found really shouldn't have affected anything, but that I could revert it if I liked. Then I checked the test logs. Not a single test case had changed its result - even the one I thought I'd fixed.
WTF?
Somewhere in the midst of the panicked flailing that ensued, I compiled the "broken" .s file to a binary and ran it, and found that actually, it ran just fine.
WTF?
After seriously doubting my sanity for a good long while, I eventually worked it out: the new version of the compiler produced correct output if and only if you compiled the test program's .c file manually to assembly language and then compiled the assembly language to a binary. If you tried to compile the .c file to a binary in one go it broke.
WTF?
Totally stumped, I took my problem to the monthly meeting of Edinburgh.pm, that drinking man's Valhalla where central Scotland's Perl geeks take sup and speak of mighty deeds. "So let me get this straight," said Aaron Crane, after I'd explained the situation to him, "you're compiling to assembly with ecc -S, and then compiling to binary with ecc -c?"
At this point a lightbulb went on in my head, because I hadn't been compiling to assembly with ecc -S: I'd been invoking acc directly. Perhaps the options ecc was passing to acc were not exactly the same as the ones I was giving it?
The next morning, more than a little hungover, I looked into this theory, and found it to be correct: ecc -S was passing acc the -O1 option (turn optimisation on, at level 1). Some further investigation revealed that by default, acc was compiling with optimisation set to level 2.
That is, the bug only manifested itself at low levels of optimisation.
WTF?
The first rule of compiler bugs is: it's not a compiler bug. No matter how sure you are that your code is correct and that the error must be the fault of those dumbasses who wrote your compiler, it almost certainly is your fault. Even if the bug disappears when you change the optimisation level, it's still probably a bug in your code and not a bug in the optimiser. But if you are one of the dumbasses who's writing the compiler, the equation changes a bit: now, you're allowed to say that bugs which only manifest at high levels of optimisation are probably signs of bugs in the optimiser - particularly if, as in this case, the test code has been inspected by literally millions of programmers over a period of more than thirty years. But bugs that only manifest at low levels of optimisation?
WTF?
Aaron suggested the solution again: the code to produce some instruction must be buggy, and turning up the optimisation level must enable some optimisation which eliminates the use of that instruction. So, all I needed to do was look for instructions which were used at -O1 but not at -O2, and check the code which generated them for bugs.
I did the check. There were no instructions produced at -O1 which were not also produced at -O2. I tried moving up a level, to the "intermediate representation" from which the assembly language is generated. Same story.
WTF?
I compiled the test program to assembly language at -O1 and -O2, and compared the outputs with diff (more precisely, with meld, which is a pretty graphical interface to diff). After staring at it for a while, I spotted a pattern: the broken version contained instructions likempy %r1,%r2,%r9 (multiply the contents of register 2 by the contents of register 9, and put the result into register 1), where the correct version contained instructions like mpy %r1,%r2,10 (multiply the contents of register 2 by 10, and put the result into register 1). I checked the original test code, and multiplication by 10 was indeed the desired effect; then I went looking through the code generator to find all the places that multiplication instructions were generated. Sure enough, I found
RULE [mpy_int_imm] o:mirMult(rs1:reg, p:mirIntConst) -> rd:reg;
TEMPLATE ALUTMPL;
COST 10;
EXPAND {
gcg_create_mpyint(gcg_expand, rs1,
UnivInt_to_int(p.Value), rd);
};
END
when what I should have found was of course
RULE [mpy_int_imm] o:mirMult(rs1:reg, p:mirIntConst) -> rd:reg;
TEMPLATE ALUTMPL;
COST 10;
EXPAND {
gcg_create_mpyintimm(gcg_expand, rs1,
UnivInt_to_int(p.Value), rd);
};
END
Where I should have been generating "multiply register by immediate value" instructions, I was generating "multiply register by register" instructions. Since the code generator represented registers internally as simple integers, the mistake didn't get picked up when I built the compiler. This explained why I wasn't finding any excess instruction types in the broken assembly: instead of generating extra instructions which were optimised away, I was mistakenly generating a type of instruction that was correctly used elsewhere. And it didn't appear in the optimised version because the program was transformed before code-generation to another form that didn't need to do those multiplications.
The error in the code generator had been inserted by me, of course: the undergraduate who'd written the summer project was entirely innocent.
[The next mystifying bug was his fault, though.] |