|Srelipmoc ni esruoc tsrif a
||[Aug. 19th, 2013|09:10 pm]
If I ever design a first course in compilers, I'll do it backwards: we'll start with code generation, move on to static analysis, then parse a token stream and finally lex source text. As we go along, students will be required to implement a small compiler for a simple language, using the techniques they've just learned. The (excellent) Stanford/Coursera compilers course does something similar, but they proceed in the opposite direction, following the dataflow in the final compiler: first they cover lexing, then parsing, then syntax analysis, then codegen. The first Edinburgh compilers course follows roughly the same plan of lectures, and I expect many other universities' courses do too.
I think a backwards course would work better for two reasons:
What am I missing?
- Halfway through the Stanford course, you have a program that can convert source text into an intermediate representation with which you can't do very much. Halfway through the backwards course, you'd have a compiler for an unfriendly source language: you could write programs directly in whatever level of IR you'd got to (I'm assuming a sensible implementation language that doesn't make entering data literals too hard), and compile them using code you'd written to native code. I think that would be pretty motivating.
- When I did the Stanford course, all the really good learning experiences were in the back end. Writing a Yacc parser was a fiddly but largely straightforward experience; writing the code generator taught me loads about how your HLL code is actually executed and how runtime systems work. I also learned some less obvious things like the value of formal language specifications¹. Most CS students won't grow up to be compiler hackers, but they will find it invaluable to have a good idea of what production compilers do to their HLL code; it'll be much more useful than knowing about all the different types of parsing techniques, anyway². Students will drop out halfway through the course, and even those who make it all the way through will be tired and stressed by the end and will thus take in less than they did at the beginning: this argues for front-loading the important material.
¹ I learned this the hard way. I largely ignored the formal part of the spec when writing my code generator, relying instead on the informal description; then towards the end of the allocated period I took a closer look at it and realised that it provided simple answers to all the thorny issues I'd been struggling with.
² The answer to "how should I write this parser?" in an industrial context is usually either "with a parser generator" or "recursive descent". LALR parsers such as those produced by Yacc are a pain to debug if you don't understand the underlying theory, true, but that's IMHO an argument for using a parser generator based on some other algorithm, most of which are more forgiving.
This entry was originally posted at http://pozorvlak.dreamwidth.org/176246.html. Please comment wherever and however is most convenient for you!
Hum, you are missing that we now have a good pseudo-LR(1) parser in Menhir in Ocaml. Second, I think that using something other than Yacc/Lex (flex/bison) in C will make learning this stuff much, much easier to learn about rather than having to struggle with C's syntax. Although, I am biased in the Ocaml direction; I think doing this in an ML derived language would make the concepts easier to grasp. At that point, it is all about tree transforms (aka the boring side of static analysis).
Yeah, I think the ML family really shine for that kind of stuff.
What if we implement simple compiler on a first day? Something that have literals and calls, enough to say "Hello, world!".
This way we will cover everything at once and then extend.
The back-to-front style of writing a compiler is exactly what we did in the compilers course that I took from Kent Dybvig in 2009, during my first year of grad school. I loved it! Every compiler pass had a specified input and output language, so that at every step of the way, you had a working compiler -- first for a language that was a very thin layer on top of x86_64, and gradually for a more and more high-level language. We ended up compiling a substantial subset of Scheme to x86_64. The course also used the nanopass framework
, which made it easy to write a compiler with lots of small easy-to-understand passes instead of a few monolithic ones.
2013-12-27 11:31 pm (UTC)
I'm AIing (the Indiana equivalent of TAing) the compilers course at Indiana this upcoming semester and can thus state definitively that this is precisely how it still done. Having taken the course last spring I echo Lindsey's sentiment.