How Silicon Valley will solve the trolley problem

The trolley problem asks how to decide between the lives of people in two groups. At the moment, it comes up in our industry in discussions around self-driving cars: Suppose a car gets into a situation where it must risk injuring either its passengers or pedestrians; which ones should it prioritize saving?

Always choosing one group leads to suboptimal outcomes. If we always save the pedestrians, they may perform attacks on car riders by willfully stepping into traffic. If we always save the passengers, we may run over a pedestrian whose life happens to be of greater worth than that of the passengers.

The crux of the trolley problem is how we should make the choice of what lives to save. As with all hard problems without clear metrics for success, it’s best to solve this by having the participants decide this for themselves.

Every time we view a website, our friendly ad networks must decide what ads we should see. This is done by having our user profile put up on auction for prospective advertisers. During a handful of milliseconds, participants may inspect our profile and place a bid for our attention according to what they see.

The same technology could be trivially repurposed for deciding the trolley problem in the context of self-driving cars.

Assume the identity of every participant in the trolley scenario is known. Practically, we know the identity of the passengers; that of the pedestrians could be known if their phones broadcast a special short-range identification signal. An incentive for broadcasting such a signal could be that we would have to assume that a person without one were one of no means.

Given this information, a car about to be involved in a collision could take a few milliseconds to send the identities of the people involved to an auction service. Participants who had the foresight of purchasing accident insurance would have agents bidding on their behalf. The winner of the bid would be safe from the forthcoming accident, and their insurance would pay some sum to the car manufacturer post-hoc as a reward for conforming to the rules.

The greater a person’s individual wealth, the better insurance coverage and latencies they could purchase, and the more accidents they could expect to survive. This aligns nicely with the values of societies like the United States, where the worth of a person’s life is proportional to their wealth.

As this system lets self-driving car manufacturers off the hook for any decisions taken, and would need coordinated ethical action on the behalf of software engineers to not be implemented, we expect to see this system in action in the world in short order.

Infosec: A board game

I’m pleased to announce the release of my new card-drawing game Infosec.

The rules of the game are simple. It is for two or more players. The player with the fewest friends is the Infosec Expert; the other players are the Coworkers.

To start, the Infosec Expert deals three cards face down from a standard deck of cards. The Coworker on the Infosec Expert’s right hand should draw one card.

"Which card?" the Coworker may ask.

"This is a simple phishing exercise," the Infosec Expert should reply. "Just pick a card."

"But they all look the same," the Coworker may object.

"Draw one. And get it right."

This exchange should go on in increasingly hostile tones until the Coworker agrees to draw a card. It will have been the wrong card. The Infosec Expert should inform the Coworker:

"You got phished. You moron. You fucking idiot. You’re such a goddamn waste of space and time. How could you have gotten this so wrong? Were you even trying? Answer me. What the fuck was that?"

Feel free to ad-lib along these lines, or draw as many cards as you want from the accompanying Admonishment Deck (expansion packs available). Include ad-hominem attacks and use as many personal details as you know. Interrupt the Coworker if they try to reply. Don’t hold back.

Once the Coworker is silent, the Infosec Expert should collect the cards, deal new ones, and proceed to the next Coworker in line.

The game ends with the Infosec Expert’s victory when all the Coworkers have left.

Backing up data like the adult I supposedly am

Like so many things I’m supposed to do but don’t — getting exercise, eating right, sleeping well, standing up for women and minorities in public spaces — backing up my data has always been something I’ve half-assed at best.

I’ve lugged around an external hard drive with a few hundred gigabytes of data for the last 10 years, and made backups to it once every three or four years or so. Every time I’ve tried restoring anything from those backups I’ve regretted it, because of course I just bought the drive, plugged it in and copied stuff to it, so it is a FAT32 drive while I have mostly had EXT4 filesystems, which means all my file permissions get lost during the process.

I’ve written shameful little shell scripts to set file permissions to 0644 and directory permissions to 0755, recursively, many many times.

Part of my problem was that I both know just enough rsync to be dangerous and have a credit card so I can provision cloud VMs, so forever just around the corner was my perfect backup solution that I’d write myself and maintain and actually do instead of dealing with whatever I had going on in my life. I’ve come to accept that this will never happen, or perhaps more definitively, that I’d rather cut myself than write and maintain another piece of ad-hoc software for myself.

Luckily I recently found two things that have solved this whole problem for me: borg and rsync.net.

Borg is backup software. It compresses and deduplicates data at the block level, and strongly encourages (but does not force) you to encrypt data before backing it up. It is everything I’d want from my half-assed rsync and shell script abomination.

I read its documentation a couple of times and was impressed. I then set about comparing different VM hosts to see which one would give me the cheapest block storage option, when the result of some random google search led me to rsync.net. They are a company that stores backups, pretty cheaply, and even more cheaply if you use borg to take them. I guess they just really love borg and want us to love it too.

I signed up for their cheapest plan, which starts at 100GB stored for $18 per year. They have no network in- or egress costs, and the storage amount can be adjusted at any time. Once my account had been activated, I did a little password reset dance, and uploaded a public SSH key.

I wanted to back up my $HOME directory, so after installing borg I ran:

export BORG_REMOTE_PATH="borg1"
borg init --encryption repokey-blake2 UID@ch-s011.rsync.net:home

This created a remote borg repository called "home" on rsync.net’s servers. The environment variable is so we use a more recent version of borg on the remote server (version 1.1.11 at the time of writing), as the default version is rather old (version 0.29.0).

When choosing what encryption method to use, one can choose between a "repokey" or a "keyfile". They both create a private key locked with a passphrase; the difference is that with "repokey" the key is stored in the borg repo, while with "keyfile" it is stored outside of it. This boils down to whether we think a passphrase is enough security for our data, or whether we think having a secret keyfile is necessary. I figured my password manager could create a strong enough passphrase for my needs, and I didn’t want to think about losing the keyfile, so I chose "repokey-blake2".

To create my first backup, I ran

borg create --exclude "$HOME/.cache" UID@ch-s011.rsync.net:home::backup-1 "$HOME"

which created the archive "backup-1" in my "home" borg repository. I didn’t change the compression algorithm from the default one.

By default borg compresses data with lz4. It can use other compression methods (xz, zlib, zstd). I compared their compression ratios on some binary files I had and found no difference between them. I think this is because the large binary files I have are mostly audio and video files in lossy formats, which don’t seem to benefit very much from further compression. I have a lot of text files as well, but text takes up so little relative space on today’s hardware that it makes no sense to spend CPU cycles on compressing it better than lz4 does.

This backup command hummed along for a good while, and through a couple of reboot cycles. Doing a second backup right after it finished (or the day after) took a lot less time because of the deduplication:

borg create --exclude "$HOME/.cache" UID@ch-s011.rsync.net:home::backup-2 "$HOME"

Restoring from backup is also easy:

borg extract UID@ch-s011.rsync.net:home::backup-2

I set this up to run as a daily timed systemd service at noon (very easy on NixOS, which every Linux user should be using unless they hate themselves), and will never, ever think about this again. For a handful of bucks a year, that is a good deal.

Review of home manager

A common theme among the people who fall in love with Nix (or its cousin NixOS) is that they want it to manage everything for them. Finding out the limits of that capability and where it breaks down is part of each’s journey.

An obvious enough limit to Nix’s reach is secret management. It can easily handle public keys, SSH or cryptographic ones, but the private keys cannot be managed by its store as it is readable by all.

The promise of Nix is that we will never break our system by updating it. We can always roll back to a previous working version, once we have a working version at all. This is in contrast to, say, every other Linux distribution. I have borked Ubuntu, Arch, Fedora and others during system updates. Often the safest way to update the system is to back up /home and reinstall the OS.

From somewhere comes the idea that a user on a Linux system should be able to install their own programs and manage their own system services. (See Chris Wellons for where you can run with this idea to if your sysadmin gives you a C compiler.) This seems odd from a historical perspective. In a true multi-user system, the system administrators normally do not want users to be able to install arbitrary software or run services. On a modern "multi"-user system, where there is a only single user, avoiding system packages and services seems like some kind of theater.

Yet we do it anyway. An argument I sometimes make to myself is that this is a cleaner separation between what I need the system to do versus what I do on it. I may want to be able to run traceroute as root, but I don’t care about root being able to run the Go compiler.

NixOS has facilities to enforce this separation. It happily creates users and their home directories, and can fill in some of the bits that go there, like public SSH keys. It can install packages for specific users, and allow them to define systemd services. It will not manage the configuration of specific user packages (like dotfiles) without some coercing. One can presumably create custom derivations of, say, ZSH with all the configuration one wants, but who has the time?

Home manager wants to fill this gap and bring the power of Nix to user environment and configuration management. It lets individual users say what packages they want installed; what services they want run; and what configuration files should go where. On the surface it seems like something I should love, but after using it for a month I wrote it out of my system and now use plain NixOS.

I used Home manager for three things:

  1. Installing packages for my user.

  2. Scheduling services for my user.

  3. Installing configuration files for my software.

The first one was never a big attraction. Home manager lets us define a list of packages in home.packages of packages to install. If we control the system configuration, we can acheive the same by defining those packages in users.users.$username.packages.

If a user controls the system configuration (directly or through a sysadmin), they can also define their own user services via systemd.user. Home manager’s selling point is that it comes with a large list of already defined services that we can enable with a boolean flag, instead of having to write our own service configuration. This is admittedly nice. In the end, I found that learning the idiosyncrazies of each home manager service definition was a less useful use of my time than learning how to define NixOS systemd services once and for all. The latter is after all where the former end up.

As a long-time sufferer of dotfile management, I had high hopes for the third point. And indeed, home manager will manage dotfiles just fine. It can do this in two modes: it can generate a config file from various options we fill out if someone has written a home manager module for the program we’re trying to configure, or it can plomp a file on the system verbatim from a source. I used the latter, as I didn’t feel like learning a configuration language to be able to partially configure program dotfiles was a good idea.

This works well, until we want to change anything in a dotfile. This experiment with home manager coincided with a regular low point in my life in which I try to use emacs. This comes with quite a lot of .emacs changes as I use Lisp for the only thing it’s ever been good for; configuring a text editor in the most complicated way imaginable. Now, the dotfiles that home manager (or Nix) puts on our systems are read-only, so every change would involve changing the source file and running home-manager switch. This seems like unnecessarily many steps, especially after I saw this brilliant Hacker news comment, which after a week of use is a much better solution for this problem.

All in all home manager is nice software. I can see it being useful for people who either don’t control the system they run on but want to use Nix in user mode to run their corner of it, or for those Nix users who gamers (a well-adjusted group of humans if there ever was one) would call "filthy casuals", that is, people who just want things to work and don’t care very much about learning how to write enough Nix to make that happen.

I’m not included in those groups, as I run this system and explicitly want to learn to use Nix in anger so I can try and fail to convince people to run it in production at work. Home manager is fine software and if it makes you happy, then please use it.

Use ad hoc structs for command-line flags in Go

The path of least resistance to commandline flag parsing in Go is to use the flag package from the standard library. A lot of times the result looks like this:

func main() {
  help := flags.Bool("help", false, "HALP")
  frobinate := flags.Int("frobinate", 0, "Amount to frobinate by")
  blargalarg := flags.String("blargalarg", "", "Social media comment")
  // [713 variables later]
  flag.Parse()

  // Much later
  if *blargalarg != "" {
    // Do things. We may or may not remember what this variable is.
  }
}

That is, we have a bunch of variables lying around we don’t really care about and take up perfectly good names. If we see one of them later in the program, we don’t have any context on where it comes from, so we have to start jumping around in the source.

In my projects I’ve used a little accounting trick to hold these flags. I find it helps me deal with them. We just define an anonymous struct to hold the flags:

func main() {
  flags := struct{
    help *bool
    frobinate *int
    blargalarg *string
    // [713 field definitions]
  }{
    help: flags.Bool("help", false, "HALP"),
    frobinate: flags.Int("frobinate", 0, "Amount to frobinate by"),
    blargalarg: flags.String("blargalarg", "", "Social media comment"),
    // [713 field instantiations]
  }
  flag.Parse()

  // Much later
  if *flags.blargalarg != "" {
    // AAAAAH YES IT'S A FLAG
  }
}

If I really need to, I can pull the anonymous struct out into its own global variable or type definition or whatever, and pass it around as arguments to functions that deal with its contents. That is not as handy with a litter of flag variables. But really I just find that defining all these flags clearly in one place makes the program easier to read later once I’ve forgotten what it does.

Kleroteria

I got picked in Kleroteria a couple of months ago. This is my contribution.


Three years ago, my wife and I moved to Amsterdam. We wanted to have kids, and thought the Netherlands was a better place to do that than Mexico, where my wife is from and we lived.

We’ve both moved around a lot, and have picked up the languages of the places we’ve lived. That’s how we speak French and Spanish, along with English. Learning those was always a necessity; no one in France is going to speak English voluntarily, and my in-laws don’t speak the best English so I had to learn Spanish.

So it was a change to come to the Netherlands. Everyone here speaks perfect English. They also have no patience for your attempts at Dutch pronunciation or verb conjugation. The second they hear you’re not from here, they switch. On paper this is great, but it means that after three years here I still don’t speak Dutch. I can never get anyone to have a whole conversation with me in it.

A couple of weeks after we came here, we got pregnant. We had a boy, who has grown up into a little man who loves cars and trains, makes funny faces at the dinner table, and keeps trying to show our dog his books. ("Pancho! See!", but Pancho doesn’t care.)

We speak to him each in our own language, and by now he understands what we say. He’s been going to daycare here since he was five months old, where they speak Dutch, so he understands that as well. Kids who grow up with more than one language start speaking later, which is can be frustrating for everyone. He knows what he wants to say, but can’t figure out what words to use, and we have to keep guessing at sounds while other parents don’t.

He’s finally speaking now. He tells us everything he wants and knows. In Dutch. Every time. Every word comes out in Dutch. And we still don’t really speak it. It’s not really any less frustrating for anyone. There’s a lot of guessing at pronunciation and Google translate.

But what do you know. I did finally get someone to speak Dutch to me.

Bug-fixing checklist

It is OK not to do all of these things for a given bug. Some bugs are trivial, or nontrivial to reproduce or write tests for. But you should explicitly decide not to do some of these things for any given bug, and be able to explain why.

  1. Is there a (possibly old, closed) issue for the bug?

  2. Can you reproduce the bug manually?

  3. Can you write a regression test for the bug?

  4. Did you check that your change actually fixes the problem?

  5. Does the fix’s commit message explain the bug, the fix, and point to the issue?

It’s debatable whether checking for old or closed issues is the responsibility of the developer fixing the bug or a project manager who triages the backlog. Sometimes the second person doesn’t exist, but the job should still be done.

Remember that a bug report is a report of a symptom. A bug fix is directed at a cause. A software symptom may have more than one cause, making bug fixes that say "This fixes issue X" very optimistic about what they claim to achieve.

Exercise moderation and common sense in all things. Both 0% and 100% test coverage are awkward places to live in. Sometimes the information or effort needed to write a test make it too expensive to do so. However, when writing a test is cheap and easy or can be made so without causing harm, there should be a good reason not to do it.

Gitsplorer

Have you ever found yourself using Git and thinking "This is great, but I wish these filesystem operations were read-only and ten times slower?". Well, friend, do I have news for you.

API of a Golang codebase at different times in its history. To do this, I figured I’d clone the repo, check out commit A and analyze it, then check out commit B and analyze that, and boo! Hiss! That’s inelegant and leaves clutter that needs to be cleaned up all around the disk. There has got to be a better way!

Once I made a Git commit hash miner because I wanted to race it against a coworker to see who could get a commit with more leading zeros into a frequently used repository at work. That had the side effect of teaching me some about Git’s internals, like what its objects are (blobs, trees, commits, tags) and how they fit together. I figured that if I could convince the Golang AST parser to read the Git database instead of the filesystem, I could do what I wanted in a much better way.

Alas, doing that would have required monkey-patching the Go standard library, and I don’t want to hunt down every system call it ends up making to be sure I got them all. However, Git is famously a content-addressable filesystem, so what if we just made a filesystem that points to a given commit in a repo and pointed the parser at that?

This turns out to be pretty easy to do by combining libgit2 and libfuse. We use the former to read objects in the Git repository. (The objects are easy to read by hand, until you have to read packed objects. That’s doable, but a bit of a distraction in what is already quite the distraction.) We then use the latter to create a very basic read-only filesystem. In the end, we have a read-only version of git checkout that writes nothing to disk.

I put a prototype of this together in Python, because I’m lazy. It’s called gitsplorer and you should absolutely not use it anywhere near a production system. It scratches my itch pretty well, though. In addition to my API comparisons (which I still haven’t got to), I do sometimes want to poke around the state of a repository at a given commit and this saves me doing a stash-checkout dance or reading the git worktree manpage again.

For fun, and to see how bad of an idea this was, I came up with a very unscientific benchmark: We checkout the Linux kernel repository at a randomly selected commit, run Boyter’s scc line-counting tool, and checkout master again. We do this both with gitsplorer and with ye olde git checkout. The results speak for themselves:

git checkout: 62 seconds
gitsplorer: 567 seconds

The gitsplorer version is also remarkable for spending all its time using 100% of a CPU, which the git version does not. (It uses around 90% of a CPU while doing the checkouts, then all of my CPUs while counting lines. The Python FUSE filesystem is single-threaded, so beyond Python being slow it must also be a point of congestion for the line counting.) I did some basic profiling of this with the wonderful profiler Austin, and saw that the Python process spends most of its time reading Git blobs. I think, but did not verify, that this is because libgit2 decompresses the contents of the blobs on every such call, while most of the reads we make are in the FUSE getattr call where we are only interested in metadata about the blob. I made no attempts to optimize any of this.

So, friends, if you’ve ever wished git checkout was read-only and 10 times slower than it is, today is your lucky day.

Lyttle Lytton 2020

My Lyttle Lytton entry for 2020:

"Actually, you do like this," maverick CEO Eric Davies, Ph.D., insisted as he pulled my foreskin back and cunnilingussed my pee hole.

I’m not exactly proud of it, but I’m glad it’s no longer in my head.

Not made for this world

It’s hot tonight, and this is going to be mostly the whiskey talking while I wait for it to get cool enough to sleep. I’ve killed the mosquitos I’ve seen, however, so until then I have only Haskell to keep me company.

This morning, tef tweeted about monads, which sent the Haskell pack his way with barks of not getting it. Just now, pinboard was reminded of some guy’s rage against Esperanto, from back in the 90’s when the web was fun and mostly devoted to things like explaining how "The Downward Spiral" is a concept album or destroying Unix instead of each other’s mental health.

For the Haskell pack, I did a PhD in the type of math that necessitates a lot of category theory, and I have looked at your use of category theory, and judged it to be unnecessary and pretentious and mainly focused on making you look smart while being entirely trivial. But this is not that kind of blog post, one that gets too tangled up in whether category theory is useful to get to the point. (If nothing else, Pijul proves that category theory is useful.) We’re here to discuss how Haskell as a whole is nonsense if you’re not an academic. Our claim is that Haskell is a useless language for writing software that has users.

Our point is simple, and focused on IO. We propose that you can measure how user-facing a program or language is by measuring how much of its time it spends or worries about doing IO. That is, after all, the medium through which anyone who is not a program’s author (of which there may be many) will interact with the program. The time spent doing IO can be on the command line, via a GUI, over a network, or wherever; but to be a serious contender for user-facing programs, a language has to make IO be easy.

C is a terrible language for most new things today. Anyone writing new software in C, that they expect to be used by other than thouroughly vetted people, needs to be able to explain why they’ve chosen C. At the same time, a lot of us are still exposed to C through the BSD or Linux kernels and syscalls, the undying popularity of K&R, random software on the internet, or other vectors. The culture around C invented the modern language textbook, K&R, and the modern user-facing program, "Hello world", both of which spend most or all of their time dealing with IO to talk to you or other users.

I claim making IO as simple as possible, which C does for all its faults, to do is analogous to trying to making it as simple as possible for other people to talk to you in designed languages as you can.[1]. Esperanto shows you can fail at that goal, if you even had it, for it favors sounds native to European languages above others. Likewise, Haskell shows you can fail at the goal of making IO easy, if you even had it, for it does not.

Haskell is a purely functional, lazily evaluated language, with a type system. Like tef explains, that is great, until you run into IO. Up until that point, you could rearrange computations in any order you liked, if they needed to be done at all. As soon as you need to do IO, though, you need something to happen before another thing, which makes you very unhappy if you’re Haskell. It in fact makes you so unhappy that you’ll drag the entire lost-at-sea community of category theorists into the orbit of your language just so you can have an abstraction for doing IO that fits into your model of the world. This abstraction, monoids, then comes with the added benefit of being abstract enough that all of your programmers can spend their time explaining it to each other instead of writing programs that use the abstraction to do IO, and therefore deal with any actual users.

Haskell is where programmers go to not have users.


1. One could say that what I mean is something more like making FFI as easy as possible, but that’s missing the point and would just move this discussion to some other, less inflammatory, level that we’re keen to avoid.