?

Log in

No account? Create an account
Attack of the DaliBug - Beware of the Train [entries|archive|friends|userinfo]
pozorvlak

[ website | My Website ]
[ userinfo | livejournal userinfo ]
[ archive | journal archive ]

Links
[Links:| My moblog Hypothetical, the place to be My (fairly feeble) website ]

Attack of the DaliBug [Sep. 15th, 2010|12:37 pm]
pozorvlak
[Tags|, , , ]

I sing a song of bugs and of geeks.

The main focus of my research group is the design of a microprocessor that uses very little power, called EnCore. I am working on the compiler for EnCore: the program that takes in programs written in a human-writable language called C, and then spits out semantically-equivalent (but hopefully more efficient) programs written in the opaque binary code that is the EnCore chip's native language. Compilers aren't absolutely necessary - the earliest computers were programmed directly in machine code - but they make programming so much easier that nobody would think of shipping a new chip without a basic compilation toolchain these days.

The way you test a compiler is by asking it to compile programs whose behaviour is well-understood, and then running the binaries it produces and comparing their behaviour to what you expected. I'd found a test program that was being compiled incorrectly, indicating a bug in the compiler; better yet, it was compiled correctly by a previous version of the compiler, so the bug had been introduced recently.

Our compiler, like most modern C compilers, consists of several programs that run one after the other:
  • The preprocessor, which takes in files of C source code (with names ending in .c), performs various textual substitutions and transformations on them, and puts the output in files ending in .i. C programmers use this facility to do things like giving names to commonly-used constants, or to configure their code so it can be compiled for different operating systems or architectures. The existence of the preprocessor is increasingly admitted to be a mistake in the design of C (or rather, there are better and safer ways of providing the features it offers), but hey, we're stuck with it now.
  • The compiler proper (in our case called acc), which takes in .i files and produces assembly language, a textual representation of the actual instructions that the computer's processor will execute. Assembly language (in files ending in .s) is useful because it faithfully and precisely represents what will actually be executed, but it's still (just about) human-readable and can still be manipulated with standard text-oriented programs. As we will see, this is a Good Thing.
  • The assembler, which takes .s files and translates them, one instruction at a time, into the binary code that the computer's processor actually understands. The results are put in object files, with the suffix .o.
  • The linker, which takes one or more .o files and combines them together into a single executable (which can be called anything, but which is called a.out by default), ensuring that cross-references match up properly and that the overall layout is what the operating system expects. The first linker was forged by Satan himself in the eternal fires of Hell, and subsequent iterations have only increased the infernality of the basic design; we shall discuss linkers no further.

Invoking these stages in the right order and with the right settings is taken care of by a driver program (in our case called ecc). So, if your project consists of two C code files called fred.c and wilma.c, then the command ecc fred.c wilma.c will first invoke the preprocessor to produce fred.i and wilma.i, then acc to produce fred.s and wilma.s, then the assembler to produce fred.o and wilma.o, then finally the linker to produce a.out.

My next move was obvious: compile the test program to assembly language using the old (working) and new (broken) versions of the compiler, and look at the differences between the two .s files - the textual nature of assembly language meant that I could use a standard tool called diff for comparing two similar text files. I duly did this, and found only one difference: in a few places, the old version was referring to one of the processor's registers as %r3 and the new version was referring to it as just r3. [A register is a unit of scratch memory in the processor, for holding values to be calculated with. Registers can be accessed much faster than main memory, but typically there's only a handful of them. One of the tricky parts of compiler construction is making efficient use of the processor's registers]. I didn't know what difference that made, but theorised that the % sign was doing some dereferencing: maybe a plain r3 was interpreted as a pointer to some other register, or something.

Shameful confession: I don't actually speak EnCore assembly language very well. Yet.

Anyway, I found the section of code where this change had been made, asked our version control software who was responsible for it, and discovered that it had been written by an undergraduate as part of his summer project. Case closed, I thought. I fired off a snotty email to my colleague Igor for signing off on obviously untested code, reverted that particular change, kicked off the full test suite, and went home.

I arrived the next morning to discover an email from Igor (who does speak EnCore assembly language properly) saying that the change I'd found really shouldn't have affected anything, but that I could revert it if I liked. Then I checked the test logs. Not a single test case had changed its result - even the one I thought I'd fixed.

WTF?

Somewhere in the midst of the panicked flailing that ensued, I compiled the "broken" .s file to a binary and ran it, and found that actually, it ran just fine.

WTF?

After seriously doubting my sanity for a good long while, I eventually worked it out: the new version of the compiler produced correct output if and only if you compiled the test program's .c file manually to assembly language and then compiled the assembly language to a binary. If you tried to compile the .c file to a binary in one go it broke.

WTF?

Totally stumped, I took my problem to the monthly meeting of Edinburgh.pm, that drinking man's Valhalla where central Scotland's Perl geeks take sup and speak of mighty deeds. "So let me get this straight," said Aaron Crane, after I'd explained the situation to him, "you're compiling to assembly with ecc -S, and then compiling to binary with ecc -c?"

At this point a lightbulb went on in my head, because I hadn't been compiling to assembly with ecc -S: I'd been invoking acc directly. Perhaps the options ecc was passing to acc were not exactly the same as the ones I was giving it?

The next morning, more than a little hungover, I looked into this theory, and found it to be correct: ecc -S was passing acc the -O1 option (turn optimisation on, at level 1). Some further investigation revealed that by default, acc was compiling with optimisation set to level 2.

That is, the bug only manifested itself at low levels of optimisation.

WTF?

The first rule of compiler bugs is: it's not a compiler bug. No matter how sure you are that your code is correct and that the error must be the fault of those dumbasses who wrote your compiler, it almost certainly is your fault. Even if the bug disappears when you change the optimisation level, it's still probably a bug in your code and not a bug in the optimiser. But if you are one of the dumbasses who's writing the compiler, the equation changes a bit: now, you're allowed to say that bugs which only manifest at high levels of optimisation are probably signs of bugs in the optimiser - particularly if, as in this case, the test code has been inspected by literally millions of programmers over a period of more than thirty years. But bugs that only manifest at low levels of optimisation?

WTF?

Aaron suggested the solution again: the code to produce some instruction must be buggy, and turning up the optimisation level must enable some optimisation which eliminates the use of that instruction. So, all I needed to do was look for instructions which were used at -O1 but not at -O2, and check the code which generated them for bugs.

I did the check. There were no instructions produced at -O1 which were not also produced at -O2. I tried moving up a level, to the "intermediate representation" from which the assembly language is generated. Same story.

WTF?

I compiled the test program to assembly language at -O1 and -O2, and compared the outputs with diff (more precisely, with meld, which is a pretty graphical interface to diff). After staring at it for a while, I spotted a pattern: the broken version contained instructions like
mpy %r1,%r2,%r9
(multiply the contents of register 2 by the contents of register 9, and put the result into register 1), where the correct version contained instructions like
mpy %r1,%r2,10
(multiply the contents of register 2 by 10, and put the result into register 1). I checked the original test code, and multiplication by 10 was indeed the desired effect; then I went looking through the code generator to find all the places that multiplication instructions were generated. Sure enough, I found
RULE [mpy_int_imm] o:mirMult(rs1:reg, p:mirIntConst) -> rd:reg;
TEMPLATE ALUTMPL;
COST 10;
EXPAND { 
  gcg_create_mpyint(gcg_expand, rs1,
                    UnivInt_to_int(p.Value), rd);
};
END
when what I should have found was of course
RULE [mpy_int_imm] o:mirMult(rs1:reg, p:mirIntConst) -> rd:reg;
TEMPLATE ALUTMPL;
COST 10;
EXPAND { 
  gcg_create_mpyintimm(gcg_expand, rs1,
                    UnivInt_to_int(p.Value), rd);
};
END
Where I should have been generating "multiply register by immediate value" instructions, I was generating "multiply register by register" instructions. Since the code generator represented registers internally as simple integers, the mistake didn't get picked up when I built the compiler. This explained why I wasn't finding any excess instruction types in the broken assembly: instead of generating extra instructions which were optimised away, I was mistakenly generating a type of instruction that was correctly used elsewhere. And it didn't appear in the optimised version because the program was transformed before code-generation to another form that didn't need to do those multiplications.

The error in the code generator had been inserted by me, of course: the undergraduate who'd written the summer project was entirely innocent.

[The next mystifying bug was his fault, though.]
linkReply

Comments:
[User Picture]From: andrewducker
2010-09-15 12:12 pm (UTC)
Thank you. That was fascinating, and reminded me why I like working several steps up the toolchain :->
(Reply) (Thread)
[User Picture]From: pozorvlak
2010-09-16 08:50 am (UTC)
Thanks!

Compilers can be a bugger, and require you to know a lot of low-level stuff, but they have some important advantages for the debugger; in particular, they transform text files into other text files, and (unless something very weird's going on, as above) their overall behaviour's usually nicely stateless. I'll take compilers over multithreaded web apps...
(Reply) (Parent) (Thread)
[User Picture]From: andrewducker
2010-09-16 09:45 am (UTC)
Oh, absolutely. I love playing with large, complex systems, and putting stuff like threads, generics, classes, etc. together to see how I can build something aesthetic and functional out of them. But sometimes the complexity can get a bit much, trying to track down odd interactions between different parts of large systems can drive you crazy :->

Edited at 2010-09-16 09:45 am (UTC)
(Reply) (Parent) (Thread)
(Deleted comment)
[User Picture]From: pozorvlak
2010-09-15 01:16 pm (UTC)
I've never programmed in raw machine code, but I've written code in a few flavours of assembly language. It's fun for a while, but I wouldn't like to write anything substantial in it :-)

Also, while assemblers are more complicated than you'd expect, they're quite a bit simpler than real compilers. In particular, assemblers can mostly work one instruction at a time. On some level, once you've got to assembly language the interesting part of the work is done - I may blog about this later.

The basic idea of how compilers work is simple enough that you can write a basic-but-complete compiler in a couple of thousand lines of code, but an industrial-strength one is a hefty piece of software.
(Reply) (Parent) (Thread)
[User Picture]From: aaroncrane
2010-09-15 12:37 pm (UTC)

What next?

So, having found the root cause, your next question ought to be “how do I make sure I don’t ever again need to spend this much time tracking down an error like this?”. A couple of humble suggestions:

  • Add more smoke tests with different optimisation settings. If you’d got smoke failures at -O0 and -O1 pointing specifically to the change that introduced the bug, spotting it would presumably have been easier.
  • Refactor the code generator to use a different representation for registers. (Though I’m aware this isn’t necessarily a change you have enough time to make.)

I almost managed to write you an email about the latter idea when you first mentioned it, but I’m afraid I was slightly too slack. The obvious trick is to wrap a trivial struct round the register number:

struct register {
    int num;
};

An object of type struct register is exactly the same size as the int it contains, and can happily be passed and returned by value without worrying about memory allocation. But it’s type-safe: a struct register isn’t implicitly convertible to or from an int without additional code. Moreover, reasonable compilers will generate the same object code for source using such a structure as for the unadorned integers, so there’s no performance trade-off. (I’ve just tried it; at least for simple code, GCC 4.2 at -O generates .s files that are byte-for-byte identical.)

If you’re interested in trying that out, a reasonable first step is to find the functions in your compiler’s internal API that take a register number, and change them to take a struct register instead. (With luck, you may be able to script the bulk of that change, rather than doing each affected function manually.) Now run make -k and see how much error output you get. Probably quite a lot, but most of the errors will be of the form “you need to add .num here”. At the very least, a couple of hours’ exploratory work of that sort should give you a clearer idea of how much effort doing the whole thing would take.

(Reply) (Thread)
[User Picture]From: pozorvlak
2010-09-15 04:13 pm (UTC)

Re: What next?

My very first question was actually "Have I made this mistake anywhere else?" So I made a systematic search, and found that indeed I had. My next thought was "how do I prevent this kind of error from occurring again", and there I got a bit stuck. I thought up the trivial struct idea, but it's a bit tricky to implement: the code generator isn't written directly in C, but rather generated from a DSL (see the code samples above) by a framework which I can't modify. An alternative, which should be possible, would be to move all the UnivInt_to_Int(p.Value) stuff into gcg_create_mpyintimm, which would then take something other than a raw int for its final argument, making its signature different from that of gcg_create_mpyint. It seems pretty strange for the only argument that is an integer to be represented by something else, though...
(Reply) (Parent) (Thread)
[User Picture]From: pozorvlak
2010-09-15 04:14 pm (UTC)

Re: What next?

The idea of testing at different optimisation settings is an excellent one, though, and ought to be easy to implement. Thanks!

[Or rather, why the hell wasn't I doing that already? Stupid boy...]
(Reply) (Parent) (Thread)
[User Picture]From: fanf
2010-09-15 03:11 pm (UTC)
I note that your compiler is structured like most ancient C compilers too :-)

(The original C compiler was actually split into preprocessor, front end, back end, assembler, linker, driver.)

Some C compilers are more integrated, e.g. gcc (or rather, its version of cc1) has the preprocessor built in nowadays. Some compilers write object code directly instead of using a separate assembler.
(Reply) (Thread)
[User Picture]From: pozorvlak
2010-09-15 03:36 pm (UTC)
Well, "ancient" and "modern" are somewhat subjective terms - IIRC, the Modern History course at Oxford takes the fall of the Western Roman Empire as its starting point :-) Interesting about the original C compiler, though - thanks! What format did the front and back end use to communicate?
(Reply) (Parent) (Thread)
[User Picture]From: fanf
2010-09-15 03:49 pm (UTC)
Have a look at http://cm.bell-labs.com/cm/cs/who/dmr/primevalC.html

It says: "The files named c0?.c are the first passes, which parse source and writes syntax trees intermingled with some text on an intermediate file. The c1?.c files are the code generators, which read the trees and generate code."

I haven't looked at the code to find out what format the syntax trees are written in.
(Reply) (Parent) (Thread)
[User Picture]From: pozorvlak
2010-09-16 08:51 am (UTC)
Thanks!
(Reply) (Parent) (Thread)
[User Picture]From: necaris
2010-09-16 08:38 am (UTC)
Thank you for the post -- fascinating!
(Reply) (Thread)
From: Chas. Owens
2014-11-04 08:53 am (UTC)

interesting bug, but why didn't you run a bisect?

Since you had a known good version and a known bad version, you could have asked the source control system for a bisect. This would have given you the first commit that failed to compile the code properly. Examination of that commit would have shown you the change you eventually found. Even if your source control system doesn't have a bisect command, it should be possible to whip up one fairly easily (even a linear search overnight might have been faster than trying to reason about the problem).
(Reply) (Thread)
[User Picture]From: pozorvlak
2014-11-04 11:27 am (UTC)

Re: interesting bug, but why didn't you run a bisect?

I'm pretty sure I tried that, but couldn't get it to work - though frustratingly, I now can't remember why exactly. My guess is that the (super-complex) build system had some subtle state-dependency that meant incremental builds were not reliable enough, and a clean build (which involved some sort of hideous messing-about with environments and chroot jails and other such things - this was in the days before Vagrant) either took too long or required manual intervention or both. In retrospect, I should probably have spent much more time trying to get bisection to work: we live and learn :-(
(Reply) (Parent) (Thread)