Log in

Beware of the Train [entries|archive|friends|userinfo]

[ website | My Website ]
[ userinfo | livejournal userinfo ]
[ archive | journal archive ]

[Links:| My moblog Hypothetical, the place to be My (fairly feeble) website ]

Commuting [Feb. 7th, 2013|12:41 pm]
[Tags|, , , , , ]
[Current Location |Dùn Èideann]
[mood |sillysilly]

Quaffing the last of my quickening cup,
I chuck fair Josie, my predatory protégée, behind her ear.
Into my knapsack I place fell Destruction,
my weapon in a thousand fights against the demon Logic
(not to mention his dread ally the Customer
who never knows exactly what she wants, but always wants it yesterday).
He sleeps lightly, but is ready
to leap into action, confounding the foe
with his strings of enchanted rubies and pearls.
To my thigh I strap Cecilweed, the aetherial horn
spun from rare African minerals in far Taiwan
and imbued with subtle magics by the wizards of Mountain View.
Shrugging on my Cuirass of Visibility,
I mount Wellington, my faithful iron steed
his spine wrought in the mighty forge of Diamondback
his innards cast by the cunning smiths of Shimano
and ride off, dodging monsters the height of a house
towards the place the ancients knew as Sràid na Banrighinn
The Street of the Queen.

Just wanna clarify that in lines 5 and 6 I'm not talking about the Growstuff customers, all of whom have been great.
link1 comment|post comment

HiPEAC 2013 Berlin HLCGB [Jan. 24th, 2013|09:59 pm]
[Tags|, , , , , , , ]

[Wherein we review an academic conference in the High/Low/Crush/Goal/Bane format used for reviewing juggling conventions on rec.juggling.]

High: My old Codeplay colleague Ally Donaldson's FAT-GPU workshop. He was talking about his GPUVerify system, which takes CUDA or OpenCL programs and either proves them free of data races and synchronisation-barrier conflicts, or finds a potential bug. It's based on an SMT solver; I think there's a lot of scope to apply constraint solvers to problems in compilation and embedded system design, and I'd like to learn more about them.

Also, getting to see the hotel's giant fishtank being cleaned, by scuba divers.

Low: My personal low point was telling a colleague about some of the problems my depression has been causing me, and having him laugh in my face - he'd been drinking, and thought I was exaggerating for comic effect. He immediately apologised when I told him that this wasn't the case, but still, not fun. The academic low point was the "current challenges in supercomputing" tutorial, which turned out to be a thinly-disguised sales pitch for the sponsor's FPGA cards. That tends not to happen at maths conferences...

Crush: am I allowed to have a crush on software? Because the benchmarking and visualisation infrastructure surrounding the Sniper x86 simulator looks so freaking cool. If I can throw away the mess of Makefiles, autoconf and R that serves the same role in our lab I will be very, very happy.

Goal: Go climbing on the Humboldthain Flakturm (fail - it turns out that Central Europe is quite cold in January, and nobody else fancied climbing on concrete at -7C). Get my various Coursera homeworks and bureaucratic form-filling done (fail - damn you, tasty German beer and hyperbolic discounting!). Meet up with maradydd, who was also in town (fail - comms and scheduling issues conspired against us. Next time, hopefully). See some interesting talks, and improve my general knowledge of the field (success!).

Bane: I was sharing a room with my Greek colleague Chris, who had a paper deadline on the Wednesday. This meant he was often up all night, and went to bed as I was getting up, so every trip into the room to get something was complicated by the presence of a sleeping person. He also kept turning the heating up until it was too hot for me to sleep. Dually, of course, he had to share his room with a crazy Brit who kept getting up as he was going to bed and opening the window to let freezing air in...
linkpost comment

Unbreaking Mercurial [Dec. 9th, 2012|09:17 pm]
[Tags|, , , , , ]

I've been using Mercurial (also known as hg) as the version-control system for a project at work. I'd heard good things about it - a Git-like system with a cleaner UI and better documentation - and was glad of the excuse to try it out. Unfortunately, I was disappointed by what I found. The docs are good, and the UI's a bit cleaner, but it's still got some odd quirks - the difference between hg resolve and hg resolve -m catches me every bloody time, for instance. Unlike Git, you aren't prompted to set missing configuration options interactively. Some of the defaults are crazy, like not sending long output to a pager. And having got used to easy, safe history-rewriting in Git, I was horrified to learn that Mercurial offered no such guarantees of safety: up until version 2.2, the equivalent of a simple commit --amend could cause you to lose work permanently. Easy history-rewriting is a big deal; it means that you never have to choose between committing frequently and only pushing easily-reviewable history.

But I persevered, and with a bit of configuration I was able to make hg more like Git more comfortable. Here's my current .hgrc:
username = Pozorvlak <pozorvlak@example.com>
merge = internal:merge
pager = LESS='FSRX' less
rebase =
record =
histedit = ~/usr/etc/hg/hg_histedit.py
fetch =
shelve = ~/usr/etc/hg/hgshelve.py
pager =
mq =
color =

You'll need at least the username line, because of the aforementioned lack of interactive configuration. The pager = LESS='FSRX' less and pager = lines send long output to less instead of letting it all spew out and overflow your console scrollback buffer. merge = internal:merge tells it to use its internal merge algorithm as a merge tool, and put ">>>>" gubbins in files in the event of conflicts. Otherwise it uses meld for merges on my machine; meld is very pretty but not history-aware, and history-aware merges are at least 50% of the point of using a DVCS in the first place. The rebase extension allows you to graft a sequence of changesets onto another part of the history graph, like git rebase; the record extension allows you to select only some of the changes in your working copy for committing, like git add -p or darcs record; the fetch extension lets you do pull-and-merge in one operation - confusingly, git pull and git fetch are the opposite way round from hg fetch and hg pull. The mq extension turns on patch queues, which I needed for some hairy operation or other once. The non-standard histedit extension works like git rebase --interactive but not, I believe, as safely - dropped commits are deleted from the history graph entirely rather than becoming unreachable from an active head. The non-standard shelve extension works like git stash, though less conveniently - once you've shelved one change you need to give a name to all subsequent ones. Perhaps a Mercurial expert reading this can tell me how to delete unwanted shelves? Or about some better extensions or settings I should be using?
link3 comments|post comment

Combinatorial explosion of branches [Dec. 6th, 2012|11:41 pm]
[Tags|, , , , , , ]

I've been running benchmarks again. The basic workflow is

  1. Create some number of directories containing the benchmark suites I want to run.
  2. Tweak the Makefiles so benchmarks are compiled and run with the compilers, simulators, libraries, flags, etc, that I care about.
  3. Optionally tweak the source code to (for instance) change the number of iterations the benchmarks are run for.
  4. Run the benchmarks!
  5. Check the output; discover that something is broken.
  6. Swear, fix the problem.
  7. Repeat until either you have enough data or the conference submission deadline gets too close and you are forced to reduce the scope of your experiments.
  8. Collate the outputs from the successful runs, and analyse them.
  9. Make encouraging noises as the graduate students do the hard work of actually writing the paper.

Suppose I want to benchmark three different simulators with two different compilers for three iteration counts. That's 18 configurations. Now note that the problem found in stage 5 and fixed in stage 6 will probably not be unique to one configuration - if it affects the invocation of one of the compilers then I'll want to propagate that change to nine configurations, for instance. If it affects the benchmarks themselves or the benchmark-invocation harness, it will need to be propagated to all of them. Sounds like this is a job for version control, right? And, of course, I've been using version control to help me with this; immediately after step 1 I check everything into Git, and then use git fetch and git merge to move changes between repositories. But this is still unpleasantly tedious and manual. For my last paper, I was comparing two different simulators with three iteration counts, and I organised this into three checkouts (x1, x10, x100), each with two branches (simulator1, simulator2). If I discovered a problem affecting simulator1, I'd fix it in, say, x1's simulator1 branch, then git pull the change into x10 and x100. When I discovered a problem affecting every configuration, I checked out the root commit of x1, fixed the bug in a new branch, then git merged that branch with the simulator1 and simulator2 branches, then git pulled those merges into x10 and x100.

Keeping track of what I'd done and what I needed to do was frankly too cognitively demanding, and I was constantly bedevilled by the sense that there had to be a Better Way. I asked about this on Twitter, and Ganesh Sittampalam suggested "use Darcs" - and you know, I think he's right, Darcs' "bag of commuting patches" model is a better fit to what I'm trying to do than Git's "DAG of snapshots" model. The obvious way to handle this in Darcs would be to have six base repositories, called "everything", "x1", "x10", "x100", "simulator1" and "simulator2"; and six working repositories, called "simulator2_x1", "simulator2_x10", "simulator2_x100", "simulator2_x1", "simulator2_x10" and "simulator2_x100". Then set up update scripts in each working repository, containing, for instance

darcs pull ../base/everything
darcs pull ../base/simulator1
darcs pull ../base/x10
and every time you fix a bug, run for i in working/*; do $i/update; done.

But! It is extremely useful to be able to commit the output logs associated with a particular state of the build scripts, so you can say "wait, what went wrong when I used the -static flag? Oh yeah, that". I don't think Darcs handles that very well - or at least, it's not easy to retrieve any particular state of a Darcs repo. Git is great for that, but whenever I think about duplicating the setup described above in Git my mind recoils in horror before I can think through the details. Perhaps it shouldn't - would this work? Is there a Better Way that I'm not seeing?

link2 comments|post comment

Falsehoods programmers believe about build systems [Dec. 6th, 2012|09:45 pm]
[Tags|, , , , ]

Inspired by Falsehoods Programmers Believe About Names, Falsehoods Programmers Believe About Time, and far, far too much time spent fighting autotools. Thanks to Aaron Crane, totherme and zeecat for their comments on earlier versions.

It is accepted by all decent people that Make sucks and needs to die, and that autotools needs to be shot, decapitated, staked through the heart and finally buried at a crossroads at midnight in a coffin full of millet. Hence, there are approximately a million and seven tools that aim to replace Make and/or autotools. Unfortunately, all of the Make-replacements I am aware of copy one or more of Make's mistakes, and many of them make new and exciting mistakes of their own.

I want to see an end to Make in my lifetime. As a service to the Make-replacement community, therefore, I present the following list of tempting but incorrect assumptions various build tools make about building software.

All of the following are wrong:
  • Build graphs are trees.
  • Build graphs are acyclic.
  • Every build step updates at most one file.
  • Every build step updates at least one file.
  • Compilers will always modify the timestamps on every file they are expected to output.
  • It's possible to tell the compiler which file to write its output to.
  • It's possible to tell the compiler which directory to write its output to.
  • It's possible to predict in advance which files the compiler will update.
  • It's possible to narrow down the set of possibly-updated files to a small hand-enumerated set.
  • It's possible to determine the dependencies of a target without building it.
  • Targets do not depend on the rules used to build them.
  • Targets depend on every rule in the whole build system.
  • Detecting changes via file hashes is always the right thing.
  • Detecting changes via file hashes is never the right thing.
  • Nobody will ever want to rebuild a subset of the available dirty targets.
  • People will only want to build software on Linux.
  • People will only want to build software on a Unix derivative.
  • Nobody will want to build software on Windows.
  • People will only want to build software on Windows.
    (Thanks to David MacIver for spotting this omission.)
  • Nobody will want to build on a system without strace or some equivalent.
  • stat is slow on modern filesystems.
  • Non-experts can reliably write portable shell script.
  • Your build tool is a great opportunity to invent a whole new language.
  • Said language does not need to be a full-featured programming language.
  • In particular, said language does not need a module system more sophisticated than #include.
  • Said language should be based on textual expansion.
  • Adding an Nth layer of textual expansion will fix the problems of the preceding N-1 layers.
  • Single-character magic variables are a good idea in a language that most programmers will rarely use.
  • System libraries and globally-installed tools never change.
  • Version numbers of system libraries and globally-installed tools only ever increase.
  • It's totally OK to spend over four hours calculating how much of a 25-minute build you should do.
  • All the code you will ever need to compile is written in precisely one language.
  • Everything lives in a single repository.
  • Files only ever get updated with timestamps by a single machine.
  • Version control systems will always update the timestamp on a file.
  • Version control systems will never update the timestamp on a file.
  • Version control systems will never change the time to one earlier than the previous timestamp.
  • Programmers don't want a system for writing build scripts; they want a system for writing systems that write build scripts.

[Exercise for the reader: which build tools make which assumptions, and which compilers violate them?]

link24 comments|post comment

Abstracts for upcoming talks [Sep. 13th, 2012|04:01 pm]
[Tags|, , , , , , ]

I've recently submitted a couple of talk proposals to upcoming conferences. Here are the abstracts.

Machine learning in (without loss of generality) Perl

London Perl Workshop, Saturday 24th November 2012. 25 minutes.

If you read a book or take a course on machine learning, you'll probably spend a lot of time learning about how to implement standard algorithms like k-nearest neighbours or Naive Bayes. That's all very interesting, but we're Perl programmers - all that stuff's on CPAN already. This talk will focus on how to use those algorithms to attack problems, how to select the best ML algorithm for your task, and how to measure and improve the performance of your machine learning system. Code samples will be in Perl, but most of what I'll say will be applicable to machine learning in any language.

Classifying Surfaces

MathsJam: The Annual Conference, 17th-18th November 2012. 5 minutes.

You may already know Euler's remarkable result that if a polyhedron has V vertices, E edges and F faces, then V - E + F = 2. This is a special case of the beautiful classification theorem for closed surfaces. I will state this classification theorem, and give a quick sketch of a proof.
link2 comments|post comment

Heterogeneous parallel programming - an explanation for laypeople [Sep. 9th, 2012|01:12 pm]
[Tags|, , , , , ]

Remember how a few years ago PCs were advertised with the number of MHz or GHz their processors ran at prominently featured? And how the numbers were constantly going up? You may have noticed that the numbers don't go up much any more, but now computers are advertised as "dual-core" or "quad-core". The reason that changed is power consumption. Double the clock speed of a chip, and you more than double its power consumption: with the Pentium 4 chip, Intel hit a clock speed ceiling as their processors started to generate more heat than could be removed.

But Moore's Law continues in operation: the number of transistors that can be placed on a given area of silicon has continued to double every eighteen months, as it has done for decades now. So how can chip makers make use of the extra capacity? The answer is multicore: placing several "cores" (whole, independent processing units) onto the same piece of silicon. Your chip can still do twice as much work as the one from eighteen months ago, but only if you split that work up into independent tasks.

This presents the software industry with a problem. We've been conditioned over the last fifty years to think that the same program will run faster if you put it on newer hardware. That's not true any more. Computer programs are basically recipes for use by particularly literal-minded and stupid cooks; imagine explaining how to cook a complex meal over the phone to someone who has to be told everything. If you're lucky, they'll have the wit to say "Er, the pan's on fire: that's bad, right?". Now let's make the task harder: you're on the phone to a room full of such clueless cooks, and your job is to get them to cooperate in the production of a complex dinner due to start in under an hour, without getting in each other's way. Sounds like a farce in the making? That's basically why multicore programming is hard.

But wait, it gets worse! The most interesting settings for computation these days are mobile devices and data centres, and these are both power-sensitive environments; mobile devices because of limited battery capacity, and data centres because more power consumption costs serious money on its own and increases your need for cooling systems which also cost serious money. If you think your electricity bill's bad, you should see Google's. Hence, one of the major themes in computer science research these days is "you know all that stuff you spent forty years speeding up? Could you please do that again, only now optimise for energy usage instead?". On the hardware side, one of the prominent ideas is heterogeneous multicore: make lots of different cores, each specialised for certain tasks (a common example is the Graphics Processing Units optimised for the highly-parallel calculations involved in 3D rendering), stick them all on the same die, farm the work out to whichever core is best suited to it, and power down the ones you're not using. To a hardware person, this sounds like a brilliant idea. To a software person, this sounds like a nightmare: now imagine that our Hell's Kitchen is full of different people with different skills, possibly speaking different languages, and you have to assign each task to the person best suited to carrying it out.

The upshot is that heterogeneous multicore programming, while currently a niche field occupied mainly by games programmers and scientists running large-scale simulations, is likely to get a lot more prominent over the coming decades. And hence another of the big themes in computer science research is "how can we make multicore programming, and particularly heterogeneous multicore programming, easier?" There are two aspects to this problem: what's the best way of writing new code, and what's the best way of porting old code (which may embody complex and poorly-documented requirements) to take advantage of multicore systems? Some of the approaches being considered are pretty Year Zero - the functional programming movement, for instance, wants us to write new code in a tightly-constrained way that is more amenable to automated mathematical analysis. Others are more conservative: for instance, my colleague Dan Powell is working on a system that observes how existing programs execute at runtime, identifies sections of code that don't interfere with each other, and speculatively executes them in parallel, rolling back to a known-good point if it turns out that they do interfere.

This brings us to the forthcoming Coursera online course in Heterogeneous Parallel Programming, which teaches you how to use the existing industry-standard tools for programming heterogeneous multicore systems. As I mentioned earlier, these are currently niche tools, requiring a lot of low-level knowledge about how the system works. But if I want to contribute to projects relating to this problem (and my research group has a lot of such projects) it's knowledge that I'll need. Plus, it sounds kinda fun.

Anyone else interested?
link1 comment|post comment

New Year's Resolutions update [Jun. 13th, 2012|06:01 pm]
[Tags|, , , , , , , , , ]

1. Start tracking my weight and calorie intake again, and get my weight back down to a level where I'm comfortable. I've been very slack on the actual calorie-tracking, but I have lost nearly a stone, and at the moment I'm bobbing along between 11st and about 11st 4lb. It would be nice to be below 11st, but I find I'm actually pretty comfortable at this weight as long as I'm doing enough exercise. So, I count that as a success.

2. Start making (and testing!) regular backups of my data. I'm now backing up my tweets with TweetBackup.com, but other than that I've made no progress on this front. Possibly my real failure was in not making all my NYRs SMART, so they'd all be pass/fail; as it is, I'm going to declare this one not yet successful.

3. Get my Gmail account down to Inbox Zero and keep it there. This one's a resounding success. Took me about a month and a half, IIRC. Next up: Browser Tab Zero.

4. Do some more Stanford online courses. There was a long period at the beginning of the year where they weren't running and we wondered if the Stanford administrators had stepped in and quietly deep-sixed the project, but then they suddenly started up again in March or so. Since then I've done Design and Analysis of Algorithms, which was brilliant; Software Engineering for Software as a Service, which I dropped out of 2/3 of the way through but somehow had amassed enough points to pass anyway; and I'm currently doing Compilers (hard but brilliant) and Human-Computer Interaction, which is way outside my comfort zone and on which I'm struggling. Fundamentals of Pharmacology starts up in a couple of weeks, and Cryptography starts sooner than that, but I don't think I'll be able to do Cryptography before Compilers finishes. Maybe next time they offer it. Anyway, I think this counts as a success.

5. Enter and complete the Meadows Half-Marathon. This was a definite success: I completed the 19.7km course in 1 hour and 37 minutes, and raised over £500 for the Against Malaria Foundation.

6. Enter (and, ideally, complete...) the Lowe Alpine Mountain Marathon. This was last weekend; my partner and I entered the C category. Our course covered 41km, gained 2650m of height, and mostly consisted of bog, large tufts of grass, steep traverses, or all three at once; we completed it in 12 hours and 33 minutes over two days and came 34th out of a hundred or so competitors. I was hoping for a faster time, but I think that's not too bad for a first attempt. Being rained on for the last two hours was no fun at all, but the worst bit was definitely the goddamn midges, which were worse than either of us had ever seen before. The itching's now just about subsided, and we're thinking of entering another one at a less midgey time of year: possibly the Original Mountain Marathon in October or the Highlander Mountain Marathon next April. Apparently the latter has a ceilidh at the mid-camp, presumably in case anyone's feeling too energetic. Anyway, this one's a success.

5/6 - I'm quite pleased with that. And I'm going to add another one (a mid-year resolution, if you will): I notice that my Munro-count currently stands at 136/284 (thanks to an excellent training weekend hiking and rock climbing on Beinn a' Bhuird); I hereby vow to have climbed half the Munros in Scotland by the end of the year. Six more to go; should be doable.
link7 comments|post comment

Party Party party [May. 6th, 2012|11:34 pm]
[Tags|, , , ]

Yesterday, hacker-turned-Tantric-priest-turned-global-resilience-guru Vinay Gupta went on one of his better rants on Twitter. I've Storified it for your pleasure here. The gist was roughly
  1. We don't have enough resources to give everyone a Western lifestyle.
  2. Said lifestyle isn't actually very good at giving us the things which really make us happy.
  3. We do, on the other hand, have the resources to throw a truly massive party and invite everyone in the world. Drugs - especially psychedelics - require very little to produce, and sex is basically free.
My favourite tweet of the stream was "Hello, I'm the Government Minister for Dancing, Getting High and Fucking. We're going to be extending opening hours and improving quality."

It strikes me that this is a fun thought experiment. Imagine: the Party Party has just swept to power on a platform of gettin' down and boogying. You have been put in charge of the newly-created Department of Dancing, Getting High and Fucking (hereinafter DDGHF)¹. Your remit is to ensure that people who want to dance, get high and/or have sex can do so as safely as possible and with minimal impact on others. What do you do, hotshot? What policies do you implement? What targets do you set? How do you measure your department's effectiveness? How do you recruit and train new DDGHF staff, and what kind of organisational culture do you try to create?

Use more than one sheet of paper if you need.

You have a reasonable amount of freedom here: in particular, I'm not going to require that you immediately legalise all drugs. You might even want to ban some that are currently legal, though if so, please explain why your version of Prohibition won't be a disaster like all the others. However, I think we can take it as read that the Party Party's manifesto commits to at least scaling back the War on Drugs.

Bonus points: how does the new broom affect other departments? How do we manage diplomatic relations with states that are less hedonically inclined? What are the Party Party's policies on poverty, the economy, defence and climate change?

I guess I should give my answerCollapse )

Edit: LJ seems to silently fail to post comments that are above a certain length, which is very irritating of it. Sorry about that! If your answer is too long, perhaps you could post it on your own blog and post a link to it here? Or split it up into multiple comments, of course.

¹ Only one Cabinet post for all three? I hear you ask. That's joined-up government for you. Feel free to create as many junior ministers as you think are merited.
link27 comments|post comment

Fixing the "blank page" problem in Wordpress [Apr. 7th, 2012|12:20 am]
[Tags|, , , ]

I've been doing some work with Wordpress off and on for the last couple of weeks - migrating a site that uses a custom CMS onto a Wordpress installation - and a couple of times I've run into the following vexing problem when setting up a local Wordpress installation for testing. I couldn't find anything about it on the web, and it took me several hours to debug, so here's a writeup in case someone else has the same problem.

Steps to reproduce: install Wordpress 3.0.5 (as provided by Ubuntu). Using the command-line mysql client, load in a database dump from a Wordpress 3.3.1 site. Visit http://localhost/wordpress (or wherever you've got it installed).

Symptoms: instead of your deathless prose, you see an entirely blank browser window. HTTP headers are sent correctly, but no page content is produced. However, http://localhost/wordpress/wp-admin is displayed correctly, and all your content is in the database.

What's actually going on: Wordpress has decided that the TwentyTen theme is broken, so it's reverting to the default theme. It is hence looking for a theme called "Wordpress Default". But the default theme is actually just called "Default". So it doesn't find a theme, and, since display is handled by the theme files, nothing gets displayed.

How to fix it: go into the admin interface, and select Appearance->Themes. Change the theme to "Default". Your blog is now visible again!

If you wish, you can now change the theme back to TwentyTen: it turns out that it's not actually broken at all.

Thanks to Konstantin Kovshenin for suggesting I turn WP_DEBUG to true in wp-config.php. This allowed me to eventually track down the problem (though, annoyingly, the "theme not found" error was only displayed on the admin page, so I didn't see it for a while).

Next question: this is clearly a bug, but it's a bug in a superseded version. Where should I report it?

Edit: on further thought, I think this may be more to do with the site whose dump I was loading in using a theme that I don't have installed. In which case, the bug may well affect the latest version of Wordpress. But I haven't yet proved this to my satisfaction.
linkpost comment

[ viewing | 10 entries back ]
[ go | earlier/later ]