?

Log in

No account? Create an account
Code-walking in Arc - Beware of the Train [entries|archive|friends|userinfo]
pozorvlak

[ website | My Website ]
[ userinfo | livejournal userinfo ]
[ archive | journal archive ]

Links
[Links:| My moblog Hypothetical, the place to be My (fairly feeble) website ]

Code-walking in Arc [Feb. 5th, 2008|02:35 am]
pozorvlak
[Tags|, , , , , , , , , ]

totherme kindly drew my attention to this blog post today. The author, Slava Akhmechet, tries to explain away that horrible feeling of unproductivity and frustration that I know so well from my attempts to use Haskell: his claim is that it's just that my expectations are miscalibrated from using imperative languages. A Haskell solution to a given problem, he claims, will take the same amount of thought as a solution in another language, but much less typing: by simple statistics, therefore, you're going to spend a lot of time staring at the screen not doing anything visible, and this can feel very unproductive if you're not used to it.

As an example, he gives the code
extractWidgets :: (Data a) => a -> [String]
extractWidgets = nub . map (\(HsIdent a)->a) . listify isWidget
    where isWidget (HsIdent actionName)
              | "Widget" `isSuffixOf` actionName = True
          isWidget _ = False
Bleh! :-( This function takes a parse tree representing some Haskell code, and extracts all the identifiers ending in "Widget", then removes the duplicates. Slava challenges any Java programmers reading to do the same in five lines or fewer.

It was apparently a matter of mere hours before Jonathan Tang responded with the following (more discussion here):
public Set extractWidgets(SimpleNode node) { return extractWidgets(node, new HashSet()); }
private Set extractWidgets(SimpleNode node, Set acc) {
  for(int i = 0; i = node.jjtGetNumChildren(); ++i) extractWidgets((SimpleNode) node.jjtGetChild(i), acc);
  if(node.getFirstToken().kind == JavaParserConstants.IDENTIFIER && node.getFirstToken().image.endsWith("Widget")) acc.add(node);
  return acc;
}
Even less pretty than the Haskell code, three of his lines went over 80 characters, and there's a sixth line with only a closing brace which was rightfully ignored in the count, but still, mad props to the guy. Incidentally, that final line hints at why the challenge was actually biased quite heavily in favour of Haskell: it counts lines rather than any other measure of length, and Haskell typically scores quite well on that. What would be a chain of statements, each on its own line, in a conventional language is often expressed as a chain of composed functions in Haskell. The complexity's the same, but the line count's lower. Low line count is a good thing for maintainability, of course, as it lets the programmer see more code at once, but it's not a very honest measure of complexity.

Anyway, this is all rather moot, because some time later someone from no stinking loops came along and totally pwnz0red Slava's challenge:
Let x be a parse-tree -- a nested list of symbols. Then in K2:
a:?,//x
a@&a _sm\:"*Widget"
In Q:
a:distinct raze over x
a where a like "*Widget"
Q, as far as I can tell, is an English-like skin over K, which is another member of the APL/J family. Array-based languages for the win!

But this got me thinking about how one would write this program in Arc. This kind of code manipulation ought to be exactly the kind of thing that Lisp, and Arc especially, is good at. None of this "parse tree representing the code" nonsense: the parse tree is the code :-)

A first, listy solution might be on the lines of
(def widgets (tree) (distinct (filter ends-in-widget:is-symbol (leaves tree))))
Unfortunately, a search through arc.arc revealed that almost none of the library functions I'd like to use existed yet. But there is a handy-looking fold-over-trees function called trav: let's warm up by using it to write leaves. After a bit of messing around, I came up with
(def leaves (tree) (trav (fn (x y) (if (atom x) (cons x y) (join x y))) idfn tree))
which is a bit messy but seems to do the trick. Now, one could define is-symbol as
(def is-symbol (s) (is (type s) 'sym))
but the Arc idiom seems to just be (isa x 'sym). There's another wrinkle, in that nil, aka the empty list, is a symbol, which isn't very helpful for us. The lack of distinct is a bit more irritating, but fortunately Arc has built-in hash tables, so I can use the Perl idiom of constructing a hash and then taking its keys. There was a slight upset the first time I tried to do this: in Perl, one would write $hash{$val}++, relying on $hash{$val} being autovivified to 0; the Arc equivalent would be (++ (hash val)), but Arc hash members are actually autovivified to nil, and ++ doesn't like being handed nil as an argument.

Anyway, let's put this together and write some code to get all the symbols from a given s-expression:
(def symbols (tree)
    (= idents (table))
    (trav (fn (x y) nil) [if (and _ (isa _ 'sym)) (= (idents _) 1)] tree) 
    (keys idents) )
We throw away the result of trav at every opportunity to avoid unnecessary consing, and construct the hash as a side-effect, relying on idents being in the scope of the closure we pass to trav. Note Arc's rather lovely syntactic sugar for one-argument closures there: anything in square brackets is a one-argument function with the argument _ (which I like to pronounce as "it").

Though it took me a while to find it, Arc has a version of filter, called keep. It also has a macro endmatch, for matching substrings at the end of a string. Defining widgets is now simple:
(def widgets (tree) (keep [endmatch "Widget" (coerce _ 'string)] (symbols tree)))
That's five lines, and I think it's at least as clear as the Haskell version we started with. We could combine the ends-in-Widget test with the main body of the code to get one line shorter:
(def widgets (tree)
    (= idents (table))
    (trav (fn (x y) nil) [if (endmatch "Widget" (coerce _ 'string)) (= (idents _) 1)] tree)
    (keys idents))
The middle line goes over 80 characters, though.

It may interest you to know that most of my development time for that was spent typing: paging through the library, trying things out at the REPL, tweaking, debugging, iterating. It's a style I find much easier than staring at a blank screen and thinking very hard, and one for which I find forgiving languages like Perl and Lisp are much better than Haskell.

But here's the real conundrum: why muck about with parse trees at all for something so simple? Why not just use grep? A fruitless few minutes mucking about with grep's outdated regex syntax gave me the answer to that, but once I'd made the decision to write it in Perl, this solution came almost immediately:
perl -ne 's/--.*//; print "$_\n" foreach /\b\w*widget\b/gi' | sort | uniq
And after writing that, I realise my view of the original problem has changed. It's no longer a case of "how much this tiny paragraph of code does", it's a case of "why the hell did he take five whole lines to write a really simple shell pipeline?" Note also that this final solution is damn near language-agnostic: the only thing that needs changing is the comment-stripper at the beginning.

On language concision in general: I typically find that Haskell requires fewer lines for a given program, but Perl requires fewer characters, and they both use about the same number of tokens. Lisp is longer and messier than Haskell for easy problems, but quickly gains ground as the problems get harder.1 The APL family own on all three axes. This is extremely unscientific, of course, and because I don't know much APL/J I can't be sure how it extends to harder problems. I did once email Paul Graham asking why, if succinctness was power, he didn't switch to a member of the APL family; he has yet to reply, but I don't attach any great significance to this.

And having got all that stuff out of my head, I'm going back to bed. Hopefully I'll be able to sleep this time :-)

1Fun exercise: translate the example code in On Lisp into Haskell (or Perl, Ruby, etc.). It really helps you to get to grips with the material in the book. I found that the Haskell solutions were shorter and cleaner than the Lisp solutions up until about a third of the way through the book, at which point Lisp started to catch up with a vengeance: shortly thereafter, he was doing things in a few lines of Lisp that simply couldn't be done in finitely many lines of Haskell. I'd be very interested to see solutions written by someone who knew what they were doing, though!
linkReply

Comments:
[User Picture]From: aftnn.org
2008-02-05 03:16 pm (UTC)

Grep is not a general purpose language

Grep is a searching DSL, so solutions to searching problems are always going to be more concise than those written in general purpose programming languages!
(Reply) (Thread)
[User Picture]From: pozorvlak
2008-02-05 04:19 pm (UTC)

Re: Grep is not a general purpose language

Oh, sure, but I think it usefully undermines Slava's implicit claim that it was a really hard problem, and Haskell is super-powerful for solving it in only five lines :-)
(Reply) (Parent) (Thread)
[User Picture]From: stronae
2008-02-05 04:25 pm (UTC)
I never went to APL because of all the funny symbols. If I had to remember how to make iota, left arrow, and so forth every time, well, you can imagine my productivity going out the window. Besides, there isn't a very good Linux version.

Up to this point, I was unaware that there were APL-like languages that didn't use Greek (or funny symbols). You say Q is one such thing?
(Reply) (Thread)
[User Picture]From: pozorvlak
2008-02-05 06:16 pm (UTC)
From what I can see, yes, it's APL with words. J and K are APL-in-ASCII: easier to type, but not significantly more memorable.
(Reply) (Parent) (Thread)
From: (Anonymous)
2008-02-12 04:53 pm (UTC)
There's also Q'Nial (http://www.nial.com/)
(Reply) (Parent) (Thread)
[User Picture]From: marapfhile
2008-02-13 04:30 pm (UTC)
can also be done in one line of k4/q, if at the expense of some clarity

(there's no filter primitive, so "x where x <cond>" is tricky to say with only one x)

k)(@).(::;&like[;"*Widget"]@)@\:?,//x
q)(@).(::;where like[;"*Widget"]@)@\:distinct raze over x
(Reply) (Thread)
From: (Anonymous)
2008-02-15 09:42 pm (UTC)
>(@).(::;&like[;"*Widget"]@)@\:?,//x
why not just
{x@&x like"*Widget"}@?,//x
Attila
(Reply) (Parent) (Thread)
From: (Anonymous)
2008-06-27 12:32 pm (UTC)

but your perl is wrong!

(echo "-- some comment" ; echo a ThingyWidget ; echo 'a "trapWidget"' ) \
| perl -ne 's/--.*//; print "$_\n" foreach /\b\w*widget\b/gi' | sort | uniq
ThingyWidget
trapWidget

Scanning a parse tree, and scanning characters is not the same thing.

Perhaps the error was in the specification, perhaps you didn't mind the "identifiers" ending in "Widget" in the parse tree, but only the sequences of identifier characters ending with the characters W i d g e and t not followed by another identifier character? If that was the case, you would have said so, and the grep or perl solution would be ok.

Or perhaps you're just lucky, and for your given input data it makes no difference. But then you cannot compare _programming_ languages, but _scripting_ abilities of the programmer.

Perhaps you are not counting the size of the wetware needed to drive the incorrect perl solution. I'm sure if you had had some strings containing something looking like the wanted identifiers, you would have noticied it (how much neurons are needed to do this pattern matching?) and you would have taken corrective actions (how much more neurons would have been needed, to program another perl solution, or switch back to Haskel?). At the same time, the Haskell solution was perfectly good and standalone, without any further involvment of wetware.
(Reply) (Thread)
[User Picture]From: pozorvlak
2008-06-27 01:17 pm (UTC)

Re: but your perl is wrong!

OK, good point. Some comments:
  1. The Haskell solution was not standalone. It required a pre-existing Haskell-parsing library.
  2. The Haskell solution would be hard to adapt to other languages.
  3. It's unfashionable to say so, but I firmly believe that there's a place in programming for the "good enough" solution that can be written quickly, used once, then thrown away. My daily computing involves many, many automatable one-off tasks, and I'm very glad to have access to tools that allow me to work in this fashion when appropriate. There are, of course, situations where it's more important to be right than to be quick; there are situations in the middle, where judgement must be exercised. In this case, my envisaged use-case would be some sort of ad-hoc code analysis, and I'd be willing to take a chance on there not being any strings containing \w*Widget in my codebase (or at least that no new identifiers will arise that way).
(Reply) (Parent) (Thread)
From: (Anonymous)
2008-07-03 01:04 pm (UTC)

Re: but your perl is wrong!

Don't forget, Haskell has two comment syntaxes: -- for single line, and {- for continued comments. I'm not a Perl programmer, but eyeballing it leads me to think it only handles -- (presumably unlike the parse tree representation which would handle both).
(Reply) (Parent) (Thread)