You are viewing pozorvlak

Beware of the Train - Lazy lists, impatience and hubris [entries|archive|friends|userinfo]

[ website | My Website ]
[ userinfo | livejournal userinfo ]
[ archive | journal archive ]

[Links:| My moblog Hypothetical, the place to be My (fairly feeble) website ]

Lazy lists, impatience and hubris [May. 16th, 2008|12:12 pm]
Previous Entry Share Next Entry
[Tags|, , , , , ]

There was a recent mailing list post by Andrew Coppin (which received some discussion on programming.reddit) in which he bemoans the way that GHC, when presented with reasonable-looking code, can suddenly and mysteriously decide that it needs to allocate gigabytes of memory, slowing your system to a crawl for ages (I've had similar problems myself). Andrew's example was the following:

I offer up the following example:
mean xs = sum xs / length xs
Now try, say, mean [1.. 1e9], and watch GHC eat several GB of RAM. (!!)
This is not so mysterious when you remember that Haskell uses linked lists as its native list structure, and so the only way to calculate length xs is to traverse the list a second time: the runtime system "helpfully" keeps around the copy of xs that it generated while calculating sum xs, and you have a space leak. But what to do about it?

Andrew suggests the rather eye-watering
mean = (\(s,n) -> s / n) . foldr (\x (s,n) -> let s' = s+x; n' = n+1 
    in s' `seq` n' `seq` (s', n')) (0,0)
Stare at that for a while, and let your eyes de-focus: after a while, it becomes apparent that he's fused the two traversals of the list into one. All that seq business is, er, something to do with strictness, I dunno. Tilton's Law of Programming applies here. Anyway, Don Stewart's written a good article showing how to get to the fast version of the code from the slow version, including some handy tips on reading intermediate compiler output to see what's really going on. It's a bit disingenuous, though - he shows that allocating the list strictly will consume over 8GB, and concludes that "Strictness is not the solution here", whereas from what I can see, this is the only part of the program where strictness doesn't help! It appears that recent versions of GHC do a lot of strictness analysis when given the -O flag, and here that's enough. But there's a lot of good stuff in there, and I shall refer back to it next time my Haskell code performs surprisingly (although I must say, the last time I heard of anyone inspecting intermediate compiler output to get adequate performance was in the late 80s on early Ada compilers...).

Anyway, that's not what I wanted to talk about. As part of a very silly discussion prompted by Andrew's post, someone challenged me to "provide, in the strict language of [my] choice -- any language at all, as long as it is strict -- a function that calculates the mean of a list of doubles that correctly produces the mean of [1 .. 10e9] in constant space".

Now, one could argue that 8GB is a constant amount of space (albeit a large one), and write a version that simply allocated the list directly. But that would miss the point, because the challenge is actually very easy. Sure, if you want to consume a sensible amount of memory and yet retain genericity, you're going to need to generate the list lazily. But what my challenger had forgotten is that one can easily hand-roll lazy data-structures in any language with lexical closures or, come to that, objects. Which is to say, in any reasonably modern language. I chose Perl, because it's the strict language I know best:
    sub mean {
            my $getnext=shift; my ($next, $done, $sum, $count);
            while (!$done) {
                    ($next, $done) = $getnext->();
                    $sum += $next; $count++
            return $sum/$count;
    my $i = 1; print mean( sub { ($i++, $i>1000000000); } );
Now, that code's by no means fast: in fact, it took over half-an-hour to complete on my girlfriend's aging laptop. But it fulfilled the "constant space" part of the spec admirably, consuming about 1.5MB consistently (almost all of that for the Perl interpreter).

What's going on there? Well, instead of accepting a list, my mean function accepts an iterator: a closure that, when called, returns a pair containing the next element of the list and a boolean saying whether we've reached the end yet (the keyword sub, as well as declaring named functions, creates anonymous coderefs which can access variables in their surrounding lexical scope: in effect, it's Perl's lambda). This iterator may not look very listlike at first glance, but it does the same thing: conceptually, therefore, it's a list. Someone, probably Duncan Coutts, once suggested to me that the best way to think about Haskell's lazy lists is as forward-only iterators over collections: implementing a toList function for your new data structure is conceptually just like building an iterator, as the next part of the list is only built when it's asked for (which is equivalent to calling next() on an iterator in Java or C++). I've found this approach very helpful.

This (next_elt, done?) approach isn't the only way of making lazy lists, but it's a perfectly sensible one. One could generalise the way we've created the list (1..1e9):
sub make_stream {
    my $i = shift; my $upper = shift; my $step = shift || 1;
    $i -= $step;
    return sub { $i += step; ($i, $i >= $upper) };
or wrap an honest-to-God Perl list:
sub stream { my $i; my @xs = @_; sub { ($xs[$i++], $i > $#xs) }
mean(stream(1,2,3,7, -32.6 10**3.7))
The general form is
# set up lexicals here
my $stream = sub { return (
    # calculation to get next element; probably mutates state
    # are we done yet?
As funktio suggested, you'd probably want to wrap the lexicals in either a do-block or a function like make_stream above.

What else could we have done? Another option would have been to create an object with next and end methods, and store any necessary state as a member variable of the object. The most standard solution would have been to create a coderef that returned (car, cdr) pairs, where car is the next element of the list, and cdr is another coderef which, when evaluated, returns the next (car, cdr) pair. This, incidentally, is how Haskell's laziness works under the hood: unevaluated values are stored as closures called "thunks", which, when called, return the value you want. If you never need the value, the thunk is never called, and so the calculation's never done. In Perl, this looks like:
sub range {
    my $lower = shift; my $upper = shift; my $step = shift || 1;
    my $test = ($lower >= $upper, 1, $lower <= $upper)[($step <=> 0) + 1];
    my $cdr = $test ? sub { range($lower + $step, $upper, $step) } : undef;
    return ($lower, $cdr);

sub range_to_list {
    my ($car, $cdr) = @_;
    my @list;
    while (defined($cdr)) {
        push @list, $car;
        ($car, $cdr) = $cdr->();
    return @list;
In practice, you should use one of the lazy list modules on CPAN. I've never used any of them, but Tie::LazyList, which lets you use lazy lists like they were ordinary Perl lists, looks good.

We're almost done, but there's something else I'd like to look at. What would all this look like in Ruby? Well, one could easily do what we've just done: Ruby's anonymous blocks fulfil the same role as Perl's coderefs. But the more natural approach, I think, is the reverse of ours: You'd define a Range class which has an each method, and then define mean by passing a callback to each:
def mean(lst)
    sum = 0
    count = 0
    lst.each do |x|
        sum += x
        count += 1
    sum / count
Then each could provide an iterator in any way appropriate, which probably wouldn't keep the list in memory. Even if we did two traversals of the list, we wouldn't run out of RAM. As a bonus, our mean function now works on any object which implements each, so it's approximately as general as the Haskell version. By the way, does anyone know how Ruby's ranges work internally?

So, the take-home messages:
  • While laziness is cool, it's not actually that magical, and you don't have to be using Haskell to take advantage of it: you can implement your own lazy data-structures in a few lines of any modern language. What Haskell brings to the table is pervasive laziness, so all the standard functions will work as-is with lazy structures, and dividing code into producer/consumer pairs is easier.
  • Perl has full lexical closures, and all that follows from them.
    [If you haven't had a look at Mark Jason Dominus' book Higher-Order Perl yet, then now would be a good time. It's a free read online, and also available in dead tree format. And be sure to read the lovely story associated with the cover.]
  • A good way to think of lazy lists in Haskell is as forward-only iterators.
  • Callbacks Are Your Friends, and widely-supported APIs (like Ruby's Enumerable API) are also your friends.
  • If you're doing anything numerical with Haskell, you probably want to be giving the -O or -O2 flag to GHC!

[User Picture]From: michiexile
2008-05-16 01:42 pm (UTC)


You did notice that your first example is basically a paraphrase of Dons version, but in perl, right?
[User Picture]From: pozorvlak
2008-05-16 02:18 pm (UTC)


Oh, sure (and Dons' final Haskell version was very close to the naive C version :-) ). But that's not really a problem: all I needed to do was show that there was no problem writing this code in a strict language. Someone had suggested that I'd be able to optimise the Haskell version by re-writing it in Scheme/ML/etc and then translating back to Haskell: I wasn't convinced, and this kicked off the whole sorry saga. Read the comments thread for the gory details. To make it entirely clear, I don't think my Perl version's as good as Dons' Haskell version: apart from anything, my version's two orders of magnitude slower! I really just wanted to make a point about how laziness worked. A few people mentioned they hadn't realised this kind of thing was possible in Perl, so I thought I'd blog about it.
[User Picture]From: pozorvlak
2008-05-17 10:31 am (UTC)


And, as it turns out, I think I was right: re-writing the code in a strict language hasn't given me much insight into optimizing the Haskell. In particular, I don't know where strictness helps (although I've discovered that "everywhere except the one place where you obviously need laziness" is an acceptable answer).
[User Picture]From: alexander_mikh
2008-05-16 06:18 pm (UTC)


Thank you for your post. I have been watching ocaml and haskell communities, but mostly for ideas to use in other languages. Your post is another step into it.

[User Picture]From: pozorvlak
2008-05-17 10:20 am (UTC)


Glad you enjoyed it!
[User Picture]From: necaris
2008-05-17 12:57 am (UTC)


The Python version would look so much like the Ruby one it's not really worth posting. Great post though! Out of curiosity, what did you use to watch memory usage?
[User Picture]From: pozorvlak
2008-05-17 10:22 am (UTC)


So Python also has a callback-based iterator API? Nifty - that's obviously grown up since I last looked seriously at the language some years ago.

I watched memory usage with top - nothing fancy.
[User Picture]From: necaris
2008-05-17 11:42 pm (UTC)


Yep, Python 2.4 and onwards -- any collection that implements the interface (which involves being able to respond to __iter__() and the resulting object knowing about next() and the exception StopIteration, as far as I know, but I could be immensely wrong) can be used in a for loop :-)
From: (Anonymous)
2008-05-17 01:30 am (UTC)

iterators, flags


Those mutable one-shot iterators will only get you so far. You need something memoizing and reusable to model lists that get used several times. Then you can really win the question and recapitulate the whole development from the start.

If you are doing at all with GHC you should be using -O or -O2 when you care about performance. It seems numeric code usually takes other tricks to get from 10-100% slower to parity with unvectorized C, like -fvia-C, -fexcess-precision, -optC, posting to haskell-cafe. Data Parallel Haskell should help, once it's finished. On the other hand, you don't have to keep inner loops in Haskell. Numeric code is the easiest to bind because you don't have to worry about data structures.
[User Picture]From: pozorvlak
2008-05-17 10:28 am (UTC)

Re: iterators, flags


> Those mutable one-shot iterators will only get you so far. You need something memoizing and reusable to model lists that get used several times.

Yes, I agree - I did that only as the SimplestThingThatCouldPossiblyWork. The car/cdr iterators developed lower down are re-usable, and memoizing them is as simple as use Memoize; memoize('range');. OTOH, I think memoization is actually the problem in this case - it would be quicker (less use of swap) to let the list go out of scope and re-calculate it when needed.

> If you are doing at all with GHC you should be using -O or -O2 when you care about performance.

To be honest, it sounds like I should probably just alias ghc to ghc -fglasgow-exts -O2 :-)
From: (Anonymous)
2008-06-25 09:25 am (UTC)

loop fusion


I'm new to haskell and I'm just guessing, but is there any insurmountalbe obstacle holding back the compiler from doing just the right thing? Is it something with the halting problem?

I'd really want to write and read code like: mean xs = sum xs / length xs

Since even gcc knows something like:
[User Picture]From: pozorvlak
2008-06-25 12:46 pm (UTC)

Re: loop fusion


The current GHC rewriting rules architecture doesn't allow it - see the comment thread of this post for why. But I think that's fundamentally because of an efficiency hack, and there's no reason in principle why it couldn't be done.
From: (Anonymous)
2009-04-23 09:01 am (UTC)

The Java version of mean + some mutterings about overflows and rounding errors


Here's the Java code for doing mean, which is surprisingly pretty and works with standard Java lists as well as more esoteric versions. It uses Iterable, which gives you a next() method. The standard implementation (ArrayList) is backed by an array, but you can easily create a lazy version.

double mean(Iterable<Double> xs) {
	double sum=0;
	int cnt=0;
	for (Double x : xs) {
		sum += x;
	return sum/cnt;

I'm surprised that no-one's mentioned the rolling mean yet:

mean' = (samples*mean + x)/(samples+1)
samples' = samples + 1

Useful when you have an ongoing sequence and want to know the mean as you go. Also, with a minor adjustment it's safe against overflows. But it does have increased rounding errors.

Regarding rounding errors, you can improve accuracy with a two pass calculation:

1) Calculate the mean of xs
2) Calculate the mean of xs - mean. Which would be zero in a perfect world. Here it gives us a correction term.
3) Subtract the correction term from the first mean you calculated to get a better mean.