|Srelipmoc ni esruoc tsrif a
||[Aug. 19th, 2013|09:10 pm]
If I ever design a first course in compilers, I'll do it backwards: we'll start with code generation, move on to static analysis, then parse a token stream and finally lex source text. As we go along, students will be required to implement a small compiler for a simple language, using the techniques they've just learned. The (excellent) Stanford/Coursera compilers course does something similar, but they proceed in the opposite direction, following the dataflow in the final compiler: first they cover lexing, then parsing, then syntax analysis, then codegen. The first Edinburgh compilers course follows roughly the same plan of lectures, and I expect many other universities' courses do too.
I think a backwards course would work better for two reasons:
What am I missing?
- Halfway through the Stanford course, you have a program that can convert source text into an intermediate representation with which you can't do very much. Halfway through the backwards course, you'd have a compiler for an unfriendly source language: you could write programs directly in whatever level of IR you'd got to (I'm assuming a sensible implementation language that doesn't make entering data literals too hard), and compile them using code you'd written to native code. I think that would be pretty motivating.
- When I did the Stanford course, all the really good learning experiences were in the back end. Writing a Yacc parser was a fiddly but largely straightforward experience; writing the code generator taught me loads about how your HLL code is actually executed and how runtime systems work. I also learned some less obvious things like the value of formal language specifications¹. Most CS students won't grow up to be compiler hackers, but they will find it invaluable to have a good idea of what production compilers do to their HLL code; it'll be much more useful than knowing about all the different types of parsing techniques, anyway². Students will drop out halfway through the course, and even those who make it all the way through will be tired and stressed by the end and will thus take in less than they did at the beginning: this argues for front-loading the important material.
¹ I learned this the hard way. I largely ignored the formal part of the spec when writing my code generator, relying instead on the informal description; then towards the end of the allocated period I took a closer look at it and realised that it provided simple answers to all the thorny issues I'd been struggling with.
² The answer to "how should I write this parser?" in an industrial context is usually either "with a parser generator" or "recursive descent". LALR parsers such as those produced by Yacc are a pain to debug if you don't understand the underlying theory, true, but that's IMHO an argument for using a parser generator based on some other algorithm, most of which are more forgiving.
This entry was originally posted at http://pozorvlak.dreamwidth.org/176246.html. Please comment wherever and however is most convenient for you!