You are viewing pozorvlak

Beware of the Train - Template Haskell: baby steps [entries|archive|friends|userinfo]
pozorvlak

[ website | My Website ]
[ userinfo | livejournal userinfo ]
[ archive | journal archive ]

Links
[Links:| My moblog Hypothetical, the place to be My (fairly feeble) website ]

Template Haskell: baby steps [Mar. 4th, 2008|11:43 am]
Previous Entry Share Next Entry
[Tags|, , , , , , , ]

My post about design patterns and if-statements in Smalltalk yesterday got a very pleasing reaction, garnering some informative comments from knowledgeable people and making the front page of programming.reddit. I guess I should have checked my facts on Smalltalk semantics more carefully before clicking "Post" :-) One comment that particularly interested me was this one (sadly anonymous), which shows a rather cute trick for using Smalltalk-style keyword methods in Haskell:
data Then = Then
data Else = Else

if' :: Bool -> Then -> a -> Else -> a -> a
if' True Then t Else f = t
if' False Then t Else f = f
Now that's pretty cool, and could even be useful for DSL implementation1. But there's a problem. See the repeated code?
data Then = Then
data Else = Else
Here, let me show it to you the way it looks to me:
data Then = Then
data Else = Else
Literally half the elements that aren't pure syntax on those lines are repeated. What I'd like to do is define a keyword function, and then just write
keyword Then
keyword Else
Or, better,
keywords Then Else
Not a big deal, you might think, but what if you were writing a whole EDSL with many keywords? Or many different EDSLs? And it's not just the repetition: by defining a keywords function, I'd move the code up an abstraction level, making my intent clearer. We could of course write
data Keywords = Then | Else
but then we lose the benefit of compile-time syntax checking of our EDSL. Well, I guess our pseudo-keywords are syntactic sugar anyway, so it doesn't really matter. But still.

A few days ago, someone asked me for an example of code that's repetitive in Haskell but wouldn't be in Perl, and here's one. In Perl, and probably in Python, I could write a keywords function. It would be ugly, and would involve hairy symbol-table hacking or calls to eval, but the user wouldn't need to know that and I'd only need to write it once (never mind the fact that the "pseudo-keyworded functions" pattern wouldn't actually make any sense in Perl...). In Lisp or Ruby, I wouldn't need to write it at all, because I could use symbols instead. But perhaps this kind of thing (and other, more egregious examples) are just the price we pay for Haskell's type system, and are more than balanced out by greater opportunities for concision and abstraction elsewhere?

Nah, bollocks to that. We can do better.

Template Haskell to the rescue!


I was a bit nervous about trying Template Haskell again: the last time I tried it, I found it utterly impenetrable and poorly-documented, and couldn't get even the simplest things to work. Fortunately, my Haskell-fu seems to have improved in the mean time, because it made a bit more sense this time round - of course gensym-expansion is implemented as a monad :-)

If you're not already familiar with Lisp macros, you're probably doomed in any attempt to understand TH. Go and read On Lisp and come back when you've done so. Actually, go and read On Lisp even if you are familiar with Lisp macros, it's an awesome book. There are, of course, important differences between Lisp macros and Template Haskell:
  • Though it comes with GHC, TH is not loaded as standard - you'll have to call GHC with the -fth flag. And -fglasgow-exts, but you should probably be using that anyway. You can't use TH with Hugs.
  • Unless I've horribly misunderstood something, it shouldn't be possible to write valid macrotic code that doesn't expand to valid Haskell. Guarantee-oriented programming, baby. It is, however, possible to write valid macrotic code which never terminates, resulting in non-terminating compilations.
  • There's no defmacro. Macro calls don't look like function calls in Haskell. To expand a macro call, use the splice notation:
    $( [macrotic code here] )
    Different things should look different, I guess. Nice nod to Unix/Perl variable interpolation there, by the way.
  • There's an extensive (not to say baroque) set of types and constructors for representing Haskell syntax trees - this is rather less simple than representing Lisp code as s-expressions, unfortunately!
  • There's a feature called "quasiquotation", which is written with "Oxford brackets": expressions are surrounded with [| and |], declarations are surrounded with [d| and |], and types are surrounded with [t| and |]. Quasiquotation seems to work a bit like backquote in Lisp. You write ordinary Haskell code in the brackets (possibly with splices), and it returns an AST of that code. Here's an example from a GHC session:
    Prelude> :m Language.Haskell.TH
    Prelude Language.Haskell.TH> runQ [d| data Fred a = Fred a a deriving Show |]
    Loading package template-haskell ... linking ... done.
    [DataD [] Fred [a_0] [NormalC Fred [(NotStrict,VarT a_0),(NotStrict,VarT a_0)]]
       [GHC.Show.Show]]
    Prelude Language.Haskell.TH> runQ [| \x -> x +  $( [|3|] ) |]
    LamE [VarP x_1] (InfixE (Just (VarE x_1)) (VarE GHC.Num.+)
       (Just (LitE (IntegerL 3))))
  • Quasiquotation is nice, but it won't do everything you need: sometimes you need to construct ASTs by hand and splice them in.
  • Quasiquoted expressions are protected from variable capture, but hand-constructed ones aren't - consequently, there's a facility to create gensyms (uniquely-named variables that are guaranteed not to clash with the names of variables in your calling code). This being Haskell, gensym generation is wrapped in a monad - specifically, the Q monad. I was going to warn you that Q was a monad but not a functor (grrr, gnash), but that seems to be fixed now. Hurrah! Q's also an instance of another typeclass, Quasi, which AFAICT abstracts the idea of "monad in which gensymming is possible". The function runQ, used above, "performs" an expression in Q (ie, a possibly-gensym-containing AST) in some other Quasi monad (here, the IO monad).
  • There are complicated restrictions on where you can call TH code from: in particular, you can't run a function at compile time (ie, in a splice) if it's defined in the same module.
  • There's a feature called "reification" (which the classically-educated will recognise as the Latin for "thingification"), which provides a reflection mechanism. The reify function takes a Name and returns a data structure (in the Q monad, natch) giving you information about what that Name refers to. This is a bit harder to explore, because you can't reify things in the IO monad (as I understand things) and so you can't play around with reification in GHCi.
The documentation's better than it was, but it's still pretty skeletal. There are docs for Language.Haskell.TH and its submodules in the online GHC library docs, but that's little more than a list of type signatures. There's a bit more in the GHC manual, and a wiki page with some handy tips (including the runQ trick used above). There are some papers on the topic, including a couple by Ian Lynagh which include some useful examples. The original paper on Template Haskell is available, which explains the rationale behind some of the design decisions made, but it suffers as documentation from the fact that papers are static and libraries are dynamic, and the interface seems to have changed substantially since it was written! There's another document written a year later, and containing some clarifications and changes, but that suffers from the dual problem that not all of it has been implemented yet.Edit: there are also a couple of tutorials, here and here. The first one's quite good, but I haven't read the second one in detail yet.

For now, the best way of finding your way around the interface seems to be looking at the type signatures and the runQ trick, with a heavy dose of guesswork. If I've missed any docs, please let me know!

OK, on with the motley. By the way, experts will probably find the next section a bit slow going: I've skipped a lot of dead ends and mistakes that I made, but I've left in useful steps which will probably seem hopelessly basic to you, in the hopes that other beginners like me might find them useful. If an explanation of something is in there, it's because I found it hard or confusing, so probably someone else will too.

Recall that we were trying to programmatically produce declarations of the form data Fred = Fred. Let's try it with quasiquoting. Because of the restrictions on calling TH code, we'll have to put it in its own module, so let's put the following in Keyword.hs so the compiler can find it:
module Keyword (keyword) where

import Language.Haskell.TH.Syntax
keyword name = [d| data $(name) = $(name) |]
Now compile:
Prelude> :l Keyword.hs
[1 of 1] Compiling Keyword          ( Keyword.hs, interpreted )

Keyword.hs:6:24: parse error on input `$('
Nope. Looks like we'll have to construct it by hand, then. We want to construct a declaration, which is presumably something of type Dec, and the relevant constructor looks to be DataD, whose type signature is DataD Cxt Name [Name] [Con] [Name]. Um. The hell? Cxt is a synonym for [Type], and that first Name is presumably the name of the type we want to construct. Time for the runQ trick. From the first example above, it looks like the first [Name] parameter expects a list of the parameters to the type: in this case, that should be empty. Con is apparently the type representing constructors, and comes in four flavours: NormalC, RecC, InfixC and ForallC. NormalC, I reckon, which takes a Name and a list of StrictTypes: the name of the constructor and whether or not it's strict in each of its arguments? That fits with our results from the runQ test, anyway. For this hack, we won't have any arguments to our constructors, so it doesn't really matter. The final argument, a list of Names, seems to be the list of typeclasses derived in the declaration: we're undoubtedly going to need Show for debugging, and we might as well have Eq to save ourselves a bit of hassle. Now we just need a way of constructing Names: I'd come up with Name (mkOccName name) NameS before I noticed mkName :: String -> Name. So our resulting code is:
{-# OPTIONS_GHC -fglasgow-exts -fth #-}

module Keyword (keyword) where

import Language.Haskell.TH.Syntax
-- keyword name = [d| data $(name) = $(name) |]

-- DataD Cxt Name [Name] [Con] [Name]
-- data Con = NormalC Name [StrictType] | ...
keyword name = DataD [] name' [] [NormalC name' []] (map mkName ["Show", "Eq"])
        where name' = mkName name
Slap the following into Main.hs:
{-# OPTIONS_GHC -fglasgow-exts -fth #-}
import Keyword

$(keyword "Fred")

main = print Fred
And...
Main.hs:4:2:
    Couldn't match expected type `Language.Haskell.TH.Syntax.Q 
[Language.Haskell.TH.Syntax.Dec]'
           against inferred type `Language.Haskell.TH.Syntax.Dec'
Failed, modules loaded: Keyword.
Arse. But at least Keyword.hs loaded OK. Let's try out the types of things with GHCi:
*Keyword> :t keyword
keyword :: String -> Dec
So keyword takes a String and returns a Dec, as expected, but it seems that $( ) actually wants a Q [Dec]. Apparently, [| ... |] returns an Expr (*tests* - no it doesn't, it returns a (Quasi m) => m Exp - there's no type called Expr in the TH libraries that I can see), but [d| ... |] returns a (Quasi m) => m [Dec], ie a list of declarations, with the whole list lifted into a Quasi monad. Weird. Still, easily fixed: we just surround the call to keyword with square brackets to make it a list, and call return on it to lift it into the monad. Main.hs now becomes
{-# OPTIONS_GHC -fglasgow-exts -fth #-}
import Keyword

$(return [keyword "Fred"])

main = print Fred
And now...
*Keyword> :l Main.hs 
[1 of 2] Compiling Keyword          ( Keyword.hs, interpreted )
[2 of 2] Compiling Main             ( Main.hs, interpreted )
Ok, modules loaded: Keyword, Main.
*Main> main
Fred
Success! But that $(return [keyword "Fred"]) is a bit ugly. Since we're forced to define a list of declarations, we might as well forget about keyword, and call keywords directly. In Keyword.hs:
keywords names = return $ map keyword names
and in Main.hs:
$(keywords $ words "Fred Barney Betty"])
Great - we've now defined three keywords (Fred, Barney and Betty) using only a single line of code in the calling module, with no repetition of data. But there's one more niggle: those capitalised keywords are a bit retro. How about we define lowercase functions fred, barney and betty which return Fred, Barney and Betty, then use the lowercase versions as keywords when writing in our EDSL? (Um, obviously some kind of DSL for describing familial relations in the Paleolithic town of Bedrock. Bear with me). We'll need to lowercase the first letter: there's undoubtedly a standard function for this, but my Hoogle-fu is too weak to find it, so we'll have to write our own. Easy enough.
import Char -- put this at the top of the file

lowerFirst "" = ""
lowerFirst (x:xs) = (toLower x):xs
It looks like there's no way of combining expressions and declarations into a single splice: fortunately, we want to define functions, and function definition is a declaration. So we can write a function to define our lowercase constructor functions, and extend keywords to call it.
lcCons name = ValD (VarP lcName) (NormalB (ConE (mkName name))) []
        where lcName = mkName $ lowerFirst name

keywords names = return (map keyword names ++ map lcCons names)
And in Main.hs:
$(keywords $ words "Fred Barney Betty")

main = do
        print Fred
        print betty
Let's try compiling it.
*Keyword> :l Main.hs
[1 of 2] Compiling Keyword          ( Keyword.hs, interpreted )
[2 of 2] Compiling Main             ( Main.hs, interpreted )
Ok, modules loaded: Keyword, Main.
*Main> main
Fred
Betty
Epic win. Note that these lowercase pseudo-keywords could conflict with the existing keywords of Haskell: perhaps it would be better to have two keywords functions, one which generated the lowercase versions and one which didn't.

The final version of Main.hs is here, and the final version of Keyword.hs is here. Not entirely straightforward, but we pretty much got there in the end. Would it be worth posting these code samples to the TH wiki, and perhaps linking here? Anyway, it's nice to have finally got this simple stuff to work: hopefully I'll be able to get some more difficult things to work soon, and this should reduce the amount of frustration and needlessly duplicated code in my Haskell programming experience.

1 Did you know you can do this in (plain) TeX, too?
\def\fred #1 barney #2 betty{#1#2}

\fred Hello there, barney Wilma betty !
produces "Hello there, Wilma!".
linkReply

Comments:
From: (Anonymous)
2008-03-04 05:36 pm (UTC)

Pragmas

(Link)

'{-# OPTIONS_GHC -fglasgow-exts -fth #-}' <-- boo! Use {-# LANGUAGE TemplateHaskell #-}!
From: (Anonymous)
2008-03-04 06:24 pm (UTC)

key$words

(Link)

If you change the name of the function keywords to just key you can write $(key$words "Fred Barney Betty"). Looks almost like just the word keywords :-)
[User Picture]From: pozorvlak
2008-03-04 06:42 pm (UTC)

Re: key$words

(Link)

"keyswords" :-)
From: nobugs.org
2008-03-05 09:16 pm (UTC)

Great article

(Link)

I've done some lisp and a reasonable amount of haskell before, but I've never played with template haskell yet. This was a really great article! I like the way you show how you uncovered how to make things work - much better than a clinical/perfect tutorial.
[User Picture]From: pozorvlak
2008-03-05 10:50 pm (UTC)

Re: Great article

(Link)

Glad you enjoyed it! I've found that one of the biggest hurdles in learning Haskell is adjusting my development process to fit (Haskell's intentionally a lot less forgiving than Perl, my previous main language, and it punishes you mercilessly for playing fast and loose), so I wanted to show as much of my thought processes as possible. Partly in the hope that some other newcomer could learn something (I wish someone had told me about using "let" to declare functions in GHCi two years ago), and partly in the hope that some expert would come along and tell me a better way to do it :-)
[User Picture]From: ryani
2008-03-14 10:10 pm (UTC)

(Link)

Somehow I missed this article before. Pretty cool.

"reify" seems to be a common word in the Haskell literature; your translation from latin as "thingify" is prety close. The best definition I've found is "to make real"; you take something abstract (a name, in this case) and turn it into something real (a structure you can traverse which tells you everything about that name).
From: (Anonymous)
2011-05-03 07:07 pm (UTC)

(Link)

What about phantom types?