?

Log in

No account? Create an account
Falsehoods programmers believe about build systems - Beware of the Train — LiveJournal [entries|archive|friends|userinfo]
pozorvlak

[ website | My Website ]
[ userinfo | livejournal userinfo ]
[ archive | journal archive ]

Links
[Links:| My moblog Hypothetical, the place to be My (fairly feeble) website ]

Falsehoods programmers believe about build systems [Dec. 6th, 2012|09:45 pm]
pozorvlak
[Tags|, , , , ]

Inspired by Falsehoods Programmers Believe About Names, Falsehoods Programmers Believe About Time, and far, far too much time spent fighting autotools. Thanks to Aaron Crane, totherme and zeecat for their comments on earlier versions.

It is accepted by all decent people that Make sucks and needs to die, and that autotools needs to be shot, decapitated, staked through the heart and finally buried at a crossroads at midnight in a coffin full of millet. Hence, there are approximately a million and seven tools that aim to replace Make and/or autotools. Unfortunately, all of the Make-replacements I am aware of copy one or more of Make's mistakes, and many of them make new and exciting mistakes of their own.

I want to see an end to Make in my lifetime. As a service to the Make-replacement community, therefore, I present the following list of tempting but incorrect assumptions various build tools make about building software.

All of the following are wrong:
  1. Build graphs are trees.
  2. Build graphs are acyclic.
  3. Every build step updates at most one file.
  4. Every build step updates at least one file.
  5. Compilers will always modify the timestamps on every file they are expected to output.
  6. It's possible to tell the compiler which file to write its output to.
  7. It's possible to tell the compiler which directory to write its output to.
  8. It's possible to predict in advance which files the compiler will update.
  9. It's possible to narrow down the set of possibly-updated files to a small hand-enumerated set.
  10. It's possible to determine the dependencies of a target without building it.
  11. Targets do not depend on the rules used to build them.
  12. Targets depend on every rule in the whole build system.
  13. Detecting changes via file hashes is always the right thing.
  14. Detecting changes via file hashes is never the right thing.
  15. Nobody will ever want to rebuild a subset of the available dirty targets.
  16. People will only want to build software on Linux.
  17. People will only want to build software on a Unix derivative.
  18. Nobody will want to build software on Windows.
  19. People will only want to build software on Windows.
    (Thanks to David MacIver for spotting this omission.)
  20. Nobody will want to build on a system without strace or some equivalent.
  21. stat is slow on modern filesystems.
  22. Non-experts can reliably write portable shell script.
  23. Your build tool is a great opportunity to invent a whole new language.
  24. Said language does not need to be a full-featured programming language.
  25. In particular, said language does not need a module system more sophisticated than #include.
  26. Said language should be based on textual expansion.
  27. Adding an Nth layer of textual expansion will fix the problems of the preceding N-1 layers.
  28. Single-character magic variables are a good idea in a language that most programmers will rarely use.
  29. System libraries and globally-installed tools never change.
  30. Version numbers of system libraries and globally-installed tools only ever increase.
  31. It's totally OK to spend over four hours calculating how much of a 25-minute build you should do.
  32. All the code you will ever need to compile is written in precisely one language.
  33. Everything lives in a single repository.
  34. Files only ever get updated with timestamps by a single machine.
  35. Version control systems will always update the timestamp on a file.
  36. Version control systems will never update the timestamp on a file.
  37. Version control systems will never change the time to one earlier than the previous timestamp.
  38. Programmers don't want a system for writing build scripts; they want a system for writing systems that write build scripts.

[Exercise for the reader: which build tools make which assumptions, and which compilers violate them?]

linkReply

Comments:
From: (Anonymous)
2019-01-13 09:21 pm (UTC)

cycle and trees

I know this 5+ years old, but I'm hoping you're still around to reply :)

I was wondering what

1. Build graphs are trees.
2. Build graphs are acyclic.


these 2 statements mean. How can a build system not detect cycles in dependencies? Wouldn't that cause a build loop?

Do you have examples of where this is wrong?
(Reply) (Thread)
[User Picture]From: pozorvlak
2019-01-14 11:22 am (UTC)

Re: cycle and trees

Hi! Yep, I'm still around, though I don't post much these days :-(

To get a build graph that isn't a tree, you just need a diamond:
  • A depends on B and C
  • B depends on D
  • C depends on D
For instance,
  • a.out depends on foo.o and bar.o
  • foo.o depends on common.h
  • bar.o depends on common.h
Note that this is a diamond whether we consider build graphs top-down (starting from the ultimate target to be built) or bottom-up (starting from the dirty files).

I can't, off the top of my head, think of a build tool that makes this error (recursive Make, possibly?), but I've certainly seen it among people writing about build tools. The downside of making this error would be unnecessary rebuilds of clean targets, and possibly race conditions.

Cycles are rarer, but as so often with build-system weirdness, LaTeX has us covered. Suppose `paper.tex` contains cross-references ("see Equation 3.2.4 on p123"). You want to generate a compiled document, `paper.pdf`. The command to do that is `pdflatex paper.tex`, which reads `paper.tex` and `paper.aux`, and updates `paper.pdf`; but if the target of a cross-reference has changed, it will also update `paper.aux`. So you have to repeatedly run `pdflatex` until `paper.pdf` and `paper.aux` stop changing. But this is not guaranteed to happen! I believe that it is possible to construct a pathological document in which the new width of a cross-reference pushes the target to a different page, which updates the .aux file again, which causes a non-convergent build cycle; however, I can't find a reference for this right now :-(

Build cycles that provably reach a fixed point if run enough times are merely infuriating; build cycles that are not guaranteed to terminate are worse; build cycles where you couldn't even get started (which I think is what you're asking about?) would be worst of all, but fortunately I'm not aware of any examples of that - since software does ultimately get built, I think we can rule it out as a case we need to handle. My point was really that if you assume or enforce acyclicity (which seems like a harmless safety measure), then your tool will be unable to correctly handle builds that rely on rerunning a compiler until a fixpoint is reached, like LaTeX.

[How do existing build tools handle this? I think they either handle the fixpoint calculation properly, and accept the possibility of infinite loops, or run the compiler a bounded (or worse, fixed) number of times.]
(Reply) (Parent) (Thread)
[User Picture]From: pozorvlak
2019-01-14 11:44 am (UTC)

Re: cycle and trees

By the way, you may be wondering "if build graphs aren't DAGs, what are they?" I'm pretty sure the answer is that they're Petri nets, as used by tup and waf.

Edited at 2019-01-14 11:48 am (UTC)
(Reply) (Parent) (Thread)
From: (Anonymous)
2019-01-15 11:09 pm (UTC)

Re: cycle and trees

If I build it as Petri Nets or any other network based flow models, there's a possibility of infinite looping - like in case of pdftex.

In this special case of pdftex I have previously used latexmk. Getting it right in average garden variety Makefile is notoriously difficult.
(Reply) (Parent) (Thread)
[User Picture]From: pozorvlak
2019-02-06 05:09 pm (UTC)

Re: cycle and trees

I think it's more important to support cyclic build graphs than it is to rule out the possibility of infinite looping. You can handle cycles by warning the user that an infinite loop is possible and/or breaking after a given number of repetitions, but if your build system doesn't allow cycles at all then you're leaving TeX users (and probably others, though really, IMHO TeX is an important-enough use-case on its own) out in the cold.

But this is a legitimate design tradeoff, and build-system designers can make different choices than I would. I just hope that they do so deliberately, rather than not realising that there's a tradeoff to be made.
(Reply) (Parent) (Thread)
From: (Anonymous)
2019-01-15 10:37 pm (UTC)

Re: cycle and trees

Hey there. It's me again. Good to see you're still around :) And thank you so much for the response.

I've been fuck*d pretty hard by build automations in the last 15 or so years in the industry. Every time I try to fix it (via proper protocols by raising bugs, raising code reviews, etc.) I often get squeezed out after being told "there are other important things in hand, build is the least of your concerns. If it works, let it be". Apparently this mind set in software industry is rather too strong. Seen it in all my jobs. And I get it - it's not something facing the end users and there isn't a direct financial incentive to it. See OpenWRT or Yocto for example - even eyeball parsing their build automation will make you want to indulge in unholy levels of violence.

Out of frustration, I've come up with a few ideas to write a build system of my own. While reading through other build systems, I came across a bunch of blogs and one of them that often shows up is your blog post which led me to ask this weird question. Anyway, I've listed a few things and I was hoping to get some inspiration from others, existing build systems, etc.

So here are a few things I've gathered.

  1. Portable - Works exactly same on all platforms it's compiled.

  2. Independent - Should never ever become a meta build system.

  3. Cross compile - Cross compiling shouldn't be any different from native compiling.

  4. Expressive - Must have a recognisable language like JS or Lua or something but not heavy weight like Python.

  5. Staged - Bunch of build stages like Maven, WAF, etc. yet never force any single stage (test before artefact generation) unless it's imperative.

  6. Configuration - Common configuration so configuring on top-level should let you materialise configurations for sub projects.

  7. Chaining - Linking dependencies should be as simple as my_project.depends_on('http://somewhere/project.tar.gz', '>=3.14').patched_by('my/patch/dir/') should download, patch, attach global configuration and build it.


So far, I don't really have much of working code, but hopefully I'll get there. That said, I do have a graph representation (C language) at a project and artefact level (lib/exe) instead of per file level which makes easier to deal with forests instead of trees.

Any inspiration from you would be greatly appreciated. If I manage to get something working, I'll have the repository loaded somewhere and send the link over.
(Reply) (Parent) (Thread)
[User Picture]From: pozorvlak
2019-02-06 05:17 pm (UTC)

Re: cycle and trees

I'd be very interested to see that!

As to your design criteria: I like 1-4 and 6 (though is Python more heavyweight than JavaScript? It's IMHO a nicer language, anyway...). On 5, I'm suspicious of any design that relies on explicit stages, especially if those stages are fixed in advance: it reminds me too much of programming in BASIC with line numbers. You end up squeezing steps into stages where they don't really fit, or wishing for extra in-between stages, and you lose dependency information beyond "this happens in stage N, so it can depend on things that happened in stages 1..N-1 and can be depended on by things in stages N+1 onwards". Explicit dependency graphs solve that problem better, and hey, you're writing a build system, so you already have a tool for dealing with dependency graphs :-) On 7, I've encountered features like that in the RPM build tools, but I'm not convinced it's the best solution in general: for libraries that are used unchanged, I prefer to use a tool like Bundler or npm, and if I need to make changes to the library I prefer to vendor it in with git-subtree and apply patches on top of it.
(Reply) (Parent) (Thread)
From: (Anonymous)
2019-02-18 04:27 am (UTC)

Re: cycle and trees

it reminds me too much of programming in BASIC with line numbers. You end up squeezing steps into stages[sic]

Lol. True. I'm looking at build through fixed stages as simple as setup, configure, build, test, install, package and then an isolated step 'clean'. Setup is where you define what your project needs, configure is called after you're done with doing UI (console/X-windows) based selection of said configuration and then followed by other steps. There are cases where a pre_X and post_X would become necessary, but you can easily wrap them in neat functions and call them inside one of these call-backs. The 'clean' step need not even be defined as generated artefacts would be known at runtime anyway. Sequencing graphs as part of your build script would likely make things faster (but no proof yet).

is Python more heavyweight than JavaScript?

Absolutely. In fact, it's a pain to embed it. Tried once before for a customer and came close to checking into a mental asylum. To embed Lua or JS would be very trivial as opposed to embedding Pything. The way I see the build system is something like a single binary no more than 5-10 MiBs that could be copy-pasted in some directory. So far, I'm assuming dependencies like PCRE, libArchive (+ all their friends), OpenSSL/embedTLS and libgit2 and most importantly a modified version of Lua all statically linked into the binary. The last I checked this was less than 10 MiB. Python on the other hand is a massive 150+ MiB bloatware that would require very specific file paths on Windows/Linux for it to exist and a butt-load of unnecessary libraries to even kick start. Also, (personal favourite bashing point) syntax.

There are several JS implementations ranging from ancient SpiderMonkey, Duktape, MuJS.. take a pick. All of these are so minimal, you can spend an afternoon with it and you'd have a portable working instance cleanly embedded into your binary without side-effects.

I've encountered features like that in the RPM build tools, but I'm not convinced it's the best solution in general

I'm looking at it from C/C++ perspective where it's quite common in enterprise cases for people to bring their favourite OSS project, then apply patches around it and compile it. In fact, this is so common, tools like Quilt (http://savannah.nongnu.org/projects/quilt) exist very specifically to address these issues (though I personally find a temporary git repository would do it a lot cleaner). Projects like Yocto or OpenWRT use their own download-patch-compile sequence as part of their builds via complex Makefiles. Though in all these cases, the 'patch' step is optional. Note, I've seen several large-scale enterprise projects where people maintain an OSS project's tar.gz in SVN along with a group of patches. See a real-life examples (names changed to protect the innocent):

 .
 ├── .svn
 └── mylib
     ├── mylib-1.1.8.tar.bz2
     ├── mylib-1.1.7.tar.bz2
     ├── mylib.mk
     └── patches
         └── mylib-1.1.7
             ├── 001-first.patch
             └── 002-second.patch


This isn't the first time I'm seeing in an org and I'm sure this won't be the last either.

I reckon, this per-project custom logic is the quickest way to alienate new contributors. Yet there are several valid/real reasons why they can't do it differently. Reasons range anywhere from disk space, unimportant for mainstream but important for project, unnecessary repository because we're still old-school with SVN, to just not worth giving a damn about.

That said, all of this is still in thin-air as I'm still working on a design. Nothing concrete except for some stray C files flying around in multiple directories just to test theories. Once I fixate on some basic design, I'll post it on GitLab and place a link here.
(Reply) (Parent) (Thread)
[User Picture]From: pozorvlak
2019-02-21 09:57 am (UTC)

Re: cycle and trees

Python on the other hand is a massive 150+ MiB bloatware that would require very specific file paths on Windows/Linux for it to exist and a butt-load of unnecessary libraries to even kick start.

Oh yeah. I actually knew this, having tried to squeeze Python onto a PXE boot disk once :-(

I'm looking at it from C/C++ perspective where it's quite common in enterprise cases for people to bring their favourite OSS project, then apply patches around it and compile it.

Aaaaaargh. This whole idea makes me itch, though I guess a general build tool should support common workflows, no matter how ugly they are. You can try to nudge users towards better workflows, but making it impossible to do what they want will just alienate them and drive them into the default position of "mess of unmaintainable imperative shell-scripts". Put another way, the job of a build-tool author is harm reduction :-)

Once I fixate on some basic design, I'll post it on GitLab and place a link here.

Awesome! I look forward to seeing it.

(Reply) (Parent) (Thread)