I got picked in
Kleroteria
a couple of months ago. This is my contribution.
Three years ago, my wife and I moved to Amsterdam. We wanted to have kids, and
thought the Netherlands was a better place to do that than Mexico, where my wife
is from and we lived.
We’ve both moved around a lot, and have picked up the languages of the places
we’ve lived. That’s how we speak French and Spanish, along with English.
Learning those was always a necessity; no one in France is going to speak
English voluntarily, and my in-laws don’t speak the best English so I had to
learn Spanish.
So it was a change to come to the Netherlands. Everyone here speaks perfect
English. They also have no patience for your attempts at Dutch pronunciation or
verb conjugation. The second they hear you’re not from here, they switch. On
paper this is great, but it means that after three years here I still don’t
speak Dutch. I can never get anyone to have a whole conversation with me in it.
A couple of weeks after we came here, we got pregnant. We had a boy, who has
grown up into a little man who loves cars and trains, makes funny faces at the
dinner table, and keeps trying to show our dog his books. ("Pancho! See!", but
Pancho doesn’t care.)
We speak to him each in our own language, and by now he understands what we say.
He’s been going to daycare here since he was five months old, where they speak
Dutch, so he understands that as well. Kids who grow up with more than one
language start speaking later, which is can be frustrating for everyone. He
knows what he wants to say, but can’t figure out what words to use, and we have
to keep guessing at sounds while other parents don’t.
He’s finally speaking now. He tells us everything he wants and knows. In Dutch.
Every time. Every word comes out in Dutch. And we still don’t really speak it.
It’s not really any less frustrating for anyone. There’s a lot of guessing at
pronunciation and Google translate.
But what do you know. I did finally get someone to speak Dutch to me.
It is OK not to do all of these things for a given bug. Some bugs are trivial,
or nontrivial to reproduce or write tests for. But you should explicitly decide
not to do some of these things for any given bug, and be able to explain why.
Is there a (possibly old, closed) issue for the bug?
Can you reproduce the bug manually?
Can you write a regression test for the bug?
Did you check that your change actually fixes the problem?
Does the fix’s commit message explain the bug, the fix, and point to the issue?
It’s debatable whether checking for old or closed issues is the responsibility
of the developer fixing the bug or a project manager who triages the backlog.
Sometimes the second person doesn’t exist, but the job should still be done.
Remember that a bug report is a report of a symptom. A bug fix is directed at
a cause. A software symptom may have more than one cause, making bug fixes that
say "This fixes issue X" very optimistic about what they claim to achieve.
Exercise moderation and common sense in all things. Both 0% and 100% test
coverage are awkward places to live in. Sometimes the information or effort
needed to write a test make it too expensive to do so. However, when writing a
test is cheap and easy or can be made so without causing harm, there should be
a good reason not to do it.
Have you ever found yourself using Git and thinking "This is great, but I wish
these filesystem operations were read-only and ten times slower?". Well, friend,
do I have news for you.
API of a Golang codebase at different times in its history. To do this, I
figured I’d clone the repo, check out commit A and analyze it, then check out
commit B and analyze that, and boo! Hiss! That’s inelegant and leaves clutter
that needs to be cleaned up all around the disk. There has got to be a better
way!
Once I made a Git commit hash
miner because I wanted to race it against a coworker to see who could get a
commit with more leading zeros into a frequently used repository at work. That
had the side effect of teaching me some about Git’s internals, like what its
objects are (blobs, trees, commits, tags) and how they fit together. I figured
that if I could convince the Golang AST parser to read the Git database instead
of the filesystem, I could do what I wanted in a much better way.
Alas, doing that would have required monkey-patching the Go standard library,
and I don’t want to hunt down every system call it ends up making to be sure I
got them all. However, Git is famously a content-addressable filesystem, so what
if we just made a filesystem that points to a given commit in a repo and pointed
the parser at that?
This turns out to be pretty easy to do by combining libgit2
and libfuse
. We
use the former to read objects in the Git repository. (The objects are easy to
read by hand, until you have to read packed objects. That’s doable, but a bit
of a distraction in what is already quite the distraction.) We then use the
latter to create a very basic read-only filesystem. In the end, we have a
read-only version of git checkout
that writes nothing to disk.
I put a prototype of this together in Python, because I’m lazy. It’s called
gitsplorer
and you should absolutely
not use it anywhere near a production system. It scratches my itch pretty well,
though. In addition to my API comparisons (which I still haven’t got to), I do
sometimes want to poke around the state of a repository at a given commit and
this saves me doing a stash-checkout dance or reading the git worktree
manpage
again.
For fun, and to see how bad of an idea this was, I came up with a very
unscientific benchmark: We checkout the Linux kernel repository at a randomly
selected
commit,
run Boyter’s scc line-counting tool, and checkout
master
again. We do this both with gitsplorer
and with ye olde git
checkout
. The results speak for themselves:
git checkout: 62 seconds
gitsplorer: 567 seconds
The gitsplorer
version is also remarkable for spending all its time using 100%
of a CPU, which the git
version does not. (It uses around 90% of a CPU while
doing the checkouts, then all of my CPUs while counting lines. The Python FUSE
filesystem is single-threaded, so beyond Python being slow it must also be a
point of congestion for the line counting.) I did some basic profiling of this
with the wonderful profiler Austin, and saw
that the Python process spends most of its time reading Git blobs. I think, but
did not verify, that this is because libgit2
decompresses the contents of the
blobs on every such call, while most of the reads we make are in the FUSE
getattr
call where we are only interested in metadata about the blob. I made
no attempts to optimize any of this.
So, friends, if you’ve ever wished git checkout
was read-only and 10 times
slower than it is, today is your lucky day.
It’s hot tonight, and this is going to be mostly the whiskey talking while I
wait for it to get cool enough to sleep. I’ve killed the mosquitos I’ve seen,
however, so until then I have only Haskell to keep me company.
This morning, tef tweeted about
monads,
which sent the Haskell pack his way with
barks of not getting it. Just now, pinboard was reminded of some guy’s rage
against
Esperanto,
from back in the 90’s when the web was fun and mostly devoted
to things like explaining how "The Downward Spiral" is a concept album or
destroying Unix instead of each other’s mental health.
For the Haskell pack, I did a PhD in the type of math that necessitates a lot of
category theory, and I have looked at your use of category theory, and judged it
to be unnecessary and pretentious and mainly focused on making you look smart
while being entirely trivial. But this is not that kind of blog post, one that
gets too tangled up in whether category theory is useful to get to the point.
(If nothing else,
Pijul
proves that category theory is useful.) We’re here to discuss how Haskell as a
whole is nonsense if you’re not an academic. Our claim is that Haskell is a
useless language for writing software that has users.
Our point is simple, and focused on IO. We propose that you can measure how
user-facing a program or language is by measuring how much of its time it spends
or worries about doing IO. That is, after all, the medium through which anyone
who is not a program’s author (of which there may be many) will interact with
the program. The time spent doing IO can be on the command line, via a GUI, over
a network, or wherever; but to be a serious contender for user-facing programs,
a language has to make IO be easy.
C is a terrible language for most new things today. Anyone writing new software
in C, that they expect to be used by other than thouroughly vetted people, needs
to be able to explain why they’ve chosen C. At the same time, a lot of us are
still exposed to C through the BSD or Linux kernels and syscalls, the undying
popularity of K&R, random software on the internet, or other vectors. The
culture around C invented the modern language textbook, K&R, and the modern
user-facing program, "Hello world", both of which spend most or all of their
time dealing with IO to talk to you or other users.
I claim making IO as simple as possible, which C does for all its faults, to do
is analogous to trying to making it as simple as possible for other people to
talk to you in designed languages as you can.. Esperanto shows you can fail
at that goal, if you even had it, for it favors sounds native to European
languages above others. Likewise, Haskell shows you can fail at the goal of
making IO easy, if you even had it, for it does not.
Haskell is a purely functional, lazily evaluated language, with a type system.
Like tef explains, that is great, until you run into IO. Up until that point,
you could rearrange computations in any order you liked, if they needed to be
done at all. As soon as you need to do IO, though, you need something to happen
before another thing, which makes you very unhappy if you’re Haskell. It in fact
makes you so unhappy that you’ll drag the entire lost-at-sea community of
category theorists into the orbit of your language just so you can have an
abstraction for doing IO that fits into your model of the world. This
abstraction, monoids, then comes with the added benefit of being abstract enough
that all of your programmers can spend their time explaining it to each other
instead of writing programs that use the abstraction to do IO, and therefore
deal with any actual users.
Haskell is where programmers go to not have users.
Here’s something I thought of when I couldn’t sleep last night.
The curvature tensor of a Kähler metric can be viewed as a Hermitian form on
\(\bigwedge^{1,1} T_X^*\) by mapping \(\operatorname{End} T_X \to \bigwedge^{1,1}
T_X^*\) via the metric.
If we’re on a compact Kähler manifold with zero first Chern class, then for each
Kähler class \(\omega\) and \((1,1)\)-classes \(u, v\), we can pick the Ricci-flat
metric in \(\omega\) and the harmonic representatives of \(u, v\). If \(R\) is the
curvature tensor of the metric, viewed as a Hermitian form, we can then set
\[
b(u,v)(\omega) := \int_X R(u, v) \, dV_{\omega}.
\]
This defines a smooth bilinear form \(b\) on the tangent space of the Kähler cone
of \(X\).
Besides being fun times, can we say anything interesting about \(b\)? For example,
what is its norm with respect to the Riemannian metric on the Kähler cone, or
its trace with respect to that metric? Can we integrate it over some subset of
the cone?
I haven’t made a lot of progress on my projects. I did create a Scaleway VM
and shoved an OAI harvester on there that’s happily downloading the arXiv’s
backlog of metadata. I can also parse the XML it fetches, and have some ideas
about how I’m going to store it.
This lack of progress mostly comes from me being nerd-sniped into thinking
about bounded queues under load. My
old work project
wanted to use a FIFO queue to hold its requests. That is a bad idea, because
FIFO queues perform very poorly under load, as I
went a little overboard
in demonstrating.
Funnily enough, that very simple thing is one of my most popular Github
projects ever.