Log in

In praise of implementation-defined languages - Beware of the Train [entries|archive|friends|userinfo]

[ website | My Website ]
[ userinfo | livejournal userinfo ]
[ archive | journal archive ]

[Links:| My moblog Hypothetical, the place to be My (fairly feeble) website ]

In praise of implementation-defined languages [Oct. 1st, 2007|09:18 pm]
[Tags|, , , , ]

Implementation-defined languages come in for a lot of flak. Users of certain languages will point to their language's standards document as if it were self-evidently proof of greater inherent worth: a language that's defined by its implementation, it is implied, is always going to be a bit non-U. It may train itself out of its dropped H's and tendency to mispronounce the word "scone"1, but it will never have the true breeding that comes from a standards document.

Which is daft, because implementation-defined languages have some important advantages. To whit,

Portability: I'm serious. Every single time I read about a nontrivial piece of Scheme or Common Lisp software, it comes with a long list of caveats: "tested on Bigloo Scheme and MIT scheme, but won't work on CMU Scheme", "Works on Clisp but not SBCL; very slow on Franz Lisp due to their broken hash table implementation" or whatever. No matter how comprehensive a spec tries to be, there will always be edge cases that differ from implementation to implementation; there will always be bugs in the implementation that vary from one implementation to another; there will always be places where the implementors have helpfully added some useful functionality that's just too good to resist. This becomes a major headache. If a language is implementation-defined, however, the path of least resistance is to port or hack on the reference implementation rather than writing a new one from scratch. Consider the list of platforms to which perl has been ported. By the way, I'll follow Perl community practice herein and use "perl" consistently for the interpreter and "Perl" for the language: if you wish to do otherwise please read my footnote about scones.

There are of course variations between different ports of a program, but they're inevitably going to be less than the variations between programs that were developed entirely independently. This makes it easier for you, the language user, to write portable code: you don't need to worry about the variations between language implementations, and can concentrate on the vagaries of the different platforms you want your code to run on.

[This assumes that the implementation can be ported: sometimes you have no choice, which explains Jython et al.]

Lack of legalism: Suppose you have a question about the meaning of some code in your favourite spec-defined language. Obviously, you need to look it up in the spec. If you're lucky, you'll know your way around the spec, and will be able to find the right place(s). If you're very lucky, it will be a well-thought-out spec, and will address this issue. If you're very very lucky, it will be a well-written spec, and you'll be able to understand the answer. If you're very very very lucky, your favourite implementation will implement this part of the spec correctly, and if you're very very very very lucky, so will all the other implementations on which you want your code to run.

[The alert reader will have deduced that I've worked with some less-than-brilliant specs in my time.]

In an implementation-defined language, the question simply doesn't arise. Type it into the interpreter/compiler and see what happens. Whatever happens is, by definition, correct behaviour.

Lack of undefined behaviour: on a similar note, there is no undefined behaviour. Whatever the implementation does is what it does.
[Edit: not quite. See yrlnry's comment below.]

By the way, I'm not saying that all specifications are bad (a good one is an excellent thing to have) or that specification-defined languages have no advantages - I'm assuming that the advantages of specification-defined languages are so well-rehearsed that I don't need to repeat them. Anyone needing further enlightenment is encouraged to go to comp.lang.scheme and say "I think spec-defined languages suck!" :-)

Now for the second part of my rant: Haskell, as we know it today, is an implementation-defined language, defined by the latest CVS snapshot of GHC. "But what about the Haskell Report, and the Revised Report, and Haskell prime, and Hugs, and, er, those other implementations?" I hear you cry. Well, what about them? Every time I asked some question, it seemed, the answer would begin with "first download the latest version of GHC, then turn on these extensions..." A spec for a language that people don't actually use is - well, not useless, if it's sufficiently close to a language that they do use, but it ain't a specification as I understand the term. Everyone in the Haskell community seems to track the latest version of GHC for its new features. This is not spec-like behaviour. Now, as I said above, I don't think that implementation-defined languages are bad: quite the reverse. I just think it would save everyone a lot of bother and false superiority if the community could admit this to itself and to outsiders.

1 The O is short: "scone" rhymes with "gone", not "groan". Unless you're talking about the town in Scotland, when it rhymes with "boon". People who insist on mispronouncing this word around me will have their scones taken away from them and donated to the Pozorvlak Pedant-Feeding Drive.

[User Picture]From: ryani
2007-10-01 10:30 pm (UTC)
I agree with you completely on the benefits of implementation-defined languages. Another example of this working well is Ruby.

Haskell 98 is a fine spec, but Haskell the language has a crunchy well-specified core covered with gooey implementation-dependent goodness. I don't just mean type-system hacks (which can be interesting and cool in their own right). I also mean things like:

- GHC's RULES pragma which lets you specify valid optimizations directly in your source code (kind of the opposite of macros):
{-# RULES "map/map" forall f g xs. map f (map g xs) = map (f.g) xs #-}
{-# RULES "foldr/build" forall f z (g :: forall b. (a->b->b) -> b -> b).
                        foldr f z (build g) = g f z #-}
- unboxed primitive types & implementation-specific internals of IO types
From my ICFP2007 renderer, in part of the interface to some fast C code:
import GHC.Exts
import Data.Array.Base
import Data.Array.IO.Internals

unsafeGetMutableArray# :: IOUArray Int Word32 -> MutableByteArray# RealWorld
unsafeGetMutableArray# (IOUArray (STUArray _ _ array#)) = array#

In fact, I think a lot of Haskell's current success comes from the quality of GHC's implementation of the language & extensions; if all you had was HUGS, nobody would be using Haskell except programming language researchers & students in functional programming classes.
(Reply) (Thread)
[User Picture]From: stronae
2007-10-02 02:43 pm (UTC)
if all you had was HUGS, nobody would be using Haskell except programming language researchers & students in functional programming classes.

That's a very amusing comment, given that that's exactly why I use Hugs. Equally amusing: I'm now getting my students out of Hugs and into GHC because Hugs gets too slow on larger datasets.
(Reply) (Parent) (Thread)
[User Picture]From: pozorvlak
2007-10-02 07:39 pm (UTC)
The rules pragma is indeed very cool. Every compiler should have one :-)
(Reply) (Parent) (Thread)
From: yrlnry
2007-10-02 02:05 pm (UTC)

Undefined behavior in Perl

Perl, despite being an implementation-defined language, does have some undefined behaviors. For example, for various implementation reasons, the locution my $static if 0 has a strange and interesting effect:
  sub foo {
    my $static = 42 if 0;
    print "static is now $static\n";

  foo() for 1..5;
would make $static behave as a "static" variable, and persist from call to call of foo(). Since this was a strange quirk of the implementation, it was officially decided by the support group that this behavior would not be supported in future versions. The manual was amended to say that this behavior was explicitly undefined, and might change in the future.

Then there have been other cases where the implementation did define the behavior, but the manual warned about it anyway. For example, one can loop lazily over the contents of a hash:

  while (my $key = each %hash) {
    # do something with $key and $hash{$key}
What happens if you modify the hash in the middle of the loop? For various implementation reasons, the manual forbade this. For example, if you add a new key to the hash, the hash might overflow, which would trigger a reorganization that would move everything around, destroying the ordering information, and the subsequent calls to each() would continue from the same place, but in the new order, making it likely that the loop would visit some keys more than once, or some not at all. So the prohibition in that case made sense.

But this reorganization never occurs when keys are deleted, so the prohibition in that case was unnecessary. Deleting a key that has already been visited will not affect the each() loop, and deleting one that has not yet been visited will just cause it to be skipped when the time comes. And, in fact, the implementation has always contained a lot of code to ensure that deleting the current key would work properly, and not spoil the each(). This is nontrivial: the deletion of the current item is postponed until the next each(), so that the pointer to the next item is not lost prematurely. Even though the implementation took some pains to make this work, the manual still forbade it.

A more arguable case is the question of what happens when you modify an array inside a loop over the array, as with:

  @a = (1..3);
  for (@a) {
    push @a, $_ + 3 if $_ % 2 == 1;
(This prints 12346.) The internals are simple, and the semantics are well-defined by the implementation, and straightforward, but the manual has the heebie-jeebies about it, and most of the Perl community is extremely superstitious about this, claiming that it is "entirely unpredictable".

There are probably other examples.

(Reply) (Thread)
[User Picture]From: pozorvlak
2007-10-02 06:59 pm (UTC)

Re: Undefined behavior in Perl

Thanks! Another example (sort of) that occurred to me is the order in which hash keys are returned by each, which is "apparently random" for sound security reasons.

May I ask how you found me? Referrer logs?
(Reply) (Parent) (Thread)
From: yrlnry
2007-10-03 12:15 am (UTC)

Re: Undefined behavior in Perl

Not quite. In former times, it was "apparently random" but actually predictable, not for security but for implementation reasons.

In some more recent version, 5.7.something, I think, it was changed to be actually random, for security reasons.

I did find you because of referrer logs. But then I got so interested in your blog that I forgot to look to see why you referred to me.

(Reply) (Parent) (Thread)
[User Picture]From: pozorvlak
2007-10-03 08:53 am (UTC)

Re: Undefined behavior in Perl

Thank you! I really like your blog, too. What prompted you to read Wilkins - Neal Stephenson's Baroque Cycle?

The link was from my recent post about the Catalan numbers, which links to your post about calculating binomial coefficients efficiently. I spent a chunk of yesterday writing code to calculate binomial coefficients in various languages so I could calculate large-ish Catalan numbers - I was away from the Internet, and couldn't check your post, so I ended up using the recurrence nCm = nCm-1 * (n-m+1)/m instead, which seems to work. Why did you use Haskell for your demonstration language in that post, by the way?

A small and largely irrelevant point, which has no doubt been raised before: you need to make the implementation iterative or tail-recursive so you don't get stack overflows.
(Reply) (Parent) (Thread)
From: yrlnry
2007-10-03 06:02 pm (UTC)

Re: Undefined behavior in Perl

There's a Jorge Luis Borges essay about the Wilkins book. It was Borges that turned me on to Sir Thomas Browne, too. I read the Stephenson thing later, because after I wrote all those blog articles about Wilkins and Hooke people kept emailing me to tell me to read the Stephenson.

It seems to be that a better way to calculate Catalan numbers would be to use the recurrence Cn = C0 Cn-1 + C1 Cn-2 + ..., maybe something like this:

catalans 0 = [1]
catalans n = 
        next : cats
          cats = catalans (n-1)
          next = sum (zipWith (*) cats (reverse cats))

catalan n = head (catalans n)

But I haven't looked into it in a lot of detail.
(Reply) (Parent) (Thread)
From: (Anonymous)
2007-10-02 02:51 pm (UTC)

Formal specifications

Well, there's specifications and then there's formal specifications. Standard ML is the only language that I'm aware of that has been formally defined. Make of that what you will.
(Reply) (Thread)
[User Picture]From: fanf
2007-10-29 07:22 pm (UTC)

Re: Formal specifications

Scheme too.
(Reply) (Parent) (Thread)
From: David Jones [wordpress.com]
2007-10-08 04:29 pm (UTC)
In practice plenty of implementation defined languages have undefined behaviour. My favourite, because it's so generally applicable, is sort.

Your language provides a way to sort a list or an array doesn't it? And you can pass a comparator function in to specify your own ordering, yes? Does it define the exact sequence of arguments that it passes to the comparator function? No. Because to do so would be to effectively require a particular implementation of sort and thereby preclude optimising an important function. You can observe the sequence of arguments that are passed to the comparator function in any particularly implementation (by using side effects if necessary), but should you rely on it? I think not. Hence exactly how sort chooses to invoke the comparator is highly likely to be in fact undefined.

And having admitted that, no you can't rely on that particular piece of implementation defined behaviour, what else can't you rely on?

(of course in a purely functional language you can't observe any difference between different orders anyway)
(Reply) (Thread)
[User Picture]From: pozorvlak
2007-10-31 02:42 pm (UTC)
I think this comes under the heading of "well-defined behaviour that you'd be unwise to rely on." markdominus explores this at greater length in his recent post on the subject.
(Reply) (Parent) (Thread)
From: atdotde.myopenid.com
2007-10-30 11:53 am (UTC)

Dependence on environment

The thing with definition by implementation is that it doesn't work. At least for real computers in the real world (and not some theoretical CS dream machines): Even if you run a program to see what it does it does not mean that the next time you run it it does the same, the behavior could depend for example on how much RAM is available or when a garbage collection might be triggered or how long the hard drive takes or if something is cached or what memory contains before you initialize it or whatever.

And this is only for programs that don't do any IO. As soon as there is IO you would have to check that the program does what you expect it to do for every possible input. Truly a big task.

To me it seems the real problem arises if your intend is not to produce a file that is accepted by the compiler but if you want your program to do certain predefined things. Then with a standard (and reasoning ability) you can decide if that is (should be) possible with your program. Otherwise, you are often thrown back at trail and error.

My main example would be plain TeX vs. LaTeX. plain is described in "The TeXbook" and although this might be a hard way in practice this helps you in finding out if and how it's possible to produce a certain document you have in mind even if Don Knuth didn't explicitly think of that possibility when he wrote the program. In LaTeX, the books are often not very concrete and you just have to try if some source file produces the desired results. I have already wasted so much time with this that if I have the choice I prefer to use plain TeX as this leads to well defined results (in a sufficient standard of rigor).

Luckily, in LaTeX you still have the source code that makes up LaTeX itself and as such this is a definition of what it does. But one should not have to consult that too often. Imagine now, this would be closed source... Then you would be in the typical situation of a windows user "oops, doesn't work anymore. let's try to click on a number of buttons to see if that makes it work again. better call a friend who is a computer expert..."

Robert (www.atdotde.de)
(Reply) (Thread)
[User Picture]From: pozorvlak
2007-10-30 12:12 pm (UTC)

Re: Dependence on environment

Except in the real real world definition by implementation seems to work perfectly well! Your concerns seem pretty theoretical to me (which is not to say that they're not real concerns, but that in practice they can be overcome, and anyway it's not clear to me how a spec would help). Which implementation-defined languages have you used?

I agree with you about TeX/LaTeX, by the way. I'm writing my thesis in LaTeX as an extended experiment to see if it's any good, and I'm not terribly impressed - there are some features I like (the package system, /ref and /cite, \[ and \]), but they could have been added to TeX in a much more lightweight manner, without compromising comprehensibility and discoverability to the extent that LaTeX does. And the documentation is horribly patronising.
(Reply) (Parent) (Thread)
From: atdotde.myopenid.com
2007-10-30 04:45 pm (UTC)

Re: Dependence on environment

If it doesn't work well it leads to nasty bugs which are very hard to discover. I think the example of html (not the standard but what is implemented by IE and Firefox for example) is such a nightmare. In non-trivial examples it can be very hard to produce code that renders in the way expected.

More Examples? Hmm, I was thinking for example of Obi-Wan type errors which are not noticed by the compiler but remain undiscovered for a long time as some memory cell a pointer points to illegally contains a zero and thus leads to consistent behavior during testing but at the customer it doesn't contain zero any more and breaks the program. You could attribute that to the C compiler not actually implementing standard C but the language defined by being implemented by the compiler. But strictly speaking this might not be an example as the standard does not claim that the compiler (or runtime library if you want) should have rejected something which is not standard C. As you say it only defines what the compiler should do with a source code which is C. Of course it would be nice if everything which does not fit the standard would be rejected (latest at run-time).

I think all I am saying is that testing for a language which is defined by implementation is even more necessary as code audit cannot work strictly speaking.

We are getting dangerously close to splitting hairs but I am thinking of a language defined by implementation as some kind of black box. You have some documentation on what that should do for some given examples but in the end you have to find out by experiment what the black box really does. And all I said was that these experiments often are not conclusive as it is not given that the behavior of the box for test cases generalizes the way you think to the real program.

Of course you could always peek into the source of the compiler and see how it translates a given program. But then again the source is some sort of (machine readable) standard.
(Reply) (Parent) (Thread)
[User Picture]From: pozorvlak
2007-10-31 02:54 pm (UTC)

Re: Dependence on environment

Well, I'd tend to call HTML an example of the worst-case for spec-defined languages: though there is a spec (or rather several), it's patchily-implemented and has been extended in all kinds of different incompatible ways. If there were only one Web browser that had been ported to lots of different architectures, Web development would be a lot easier.

Your second paragraph makes it look like you've confused my post with a response written by Mark Jason Dominus, which discusses the behaviour of C compilers when presented with non-C programs. And again, I don't see how a spec would help here: if anything, that kind of weird inconsistent behaviour seems more likely to arise as a result of different implementations of an underspecified language.

I think all I am saying is that testing for a language which is defined by implementation is even more necessary as code audit cannot work strictly speaking.
OK, you're probably right here. This is not necessarily a problem if you find test-driven development works for you. Me, I even test mathematical proofs wherever possible :-)

it is not given that the behavior of the box for test cases generalizes the way you think to the real program.
So test with the real program.

But then again the source is some sort of (machine readable) standard.
This is really the point - it's a machine readable standard, which is far more detailed than the human-readable standard would be. We also eliminate entirely the question of whether the interpreter/compiler is in fact a faithful implementation of the human-readable standard - Don't Repeat Yourself :-)

Overall, it looks like I've had experiences with good implementation-defined languages and bad spec-defined languages, whereas you've only had the opposite. I'd like to reiterate that I'm not completely opposed to spec-defined languages - when there's a good, clear spec, with good implementations, they're excellent - but they're not automatically better than implementation-defined languages, as many people seem to think.
(Reply) (Parent) (Thread)
From: (Anonymous)
2007-12-15 01:12 pm (UTC)


very interesting, but I don't agree with you
(Reply) (Thread)