It is OK not to do all of these things for a given bug. Some bugs are trivial,
or nontrivial to reproduce or write tests for. But you should explicitly decide
not to do some of these things for any given bug, and be able to explain why.
-
Is there a (possibly old, closed) issue for the bug?
-
Can you reproduce the bug manually?
-
Can you write a regression test for the bug?
-
Did you check that your change actually fixes the problem?
-
Does the fix’s commit message explain the bug, the fix, and point to the issue?
It’s debatable whether checking for old or closed issues is the responsibility
of the developer fixing the bug or a project manager who triages the backlog.
Sometimes the second person doesn’t exist, but the job should still be done.
Remember that a bug report is a report of a symptom. A bug fix is directed at
a cause. A software symptom may have more than one cause, making bug fixes that
say "This fixes issue X" very optimistic about what they claim to achieve.
Exercise moderation and common sense in all things. Both 0% and 100% test
coverage are awkward places to live in. Sometimes the information or effort
needed to write a test make it too expensive to do so. However, when writing a
test is cheap and easy or can be made so without causing harm, there should be
a good reason not to do it.
Have you ever found yourself using Git and thinking "This is great, but I wish
these filesystem operations were read-only and ten times slower?". Well, friend,
do I have news for you.
API of a Golang codebase at different times in its history. To do this, I
figured I’d clone the repo, check out commit A and analyze it, then check out
commit B and analyze that, and boo! Hiss! That’s inelegant and leaves clutter
that needs to be cleaned up all around the disk. There has got to be a better
way!
Once I made a Git commit hash
miner because I wanted to race it against a coworker to see who could get a
commit with more leading zeros into a frequently used repository at work. That
had the side effect of teaching me some about Git’s internals, like what its
objects are (blobs, trees, commits, tags) and how they fit together. I figured
that if I could convince the Golang AST parser to read the Git database instead
of the filesystem, I could do what I wanted in a much better way.
Alas, doing that would have required monkey-patching the Go standard library,
and I don’t want to hunt down every system call it ends up making to be sure I
got them all. However, Git is famously a content-addressable filesystem, so what
if we just made a filesystem that points to a given commit in a repo and pointed
the parser at that?
This turns out to be pretty easy to do by combining libgit2
and libfuse
. We
use the former to read objects in the Git repository. (The objects are easy to
read by hand, until you have to read packed objects. That’s doable, but a bit
of a distraction in what is already quite the distraction.) We then use the
latter to create a very basic read-only filesystem. In the end, we have a
read-only version of git checkout
that writes nothing to disk.
I put a prototype of this together in Python, because I’m lazy. It’s called
gitsplorer
and you should absolutely
not use it anywhere near a production system. It scratches my itch pretty well,
though. In addition to my API comparisons (which I still haven’t got to), I do
sometimes want to poke around the state of a repository at a given commit and
this saves me doing a stash-checkout dance or reading the git worktree
manpage
again.
For fun, and to see how bad of an idea this was, I came up with a very
unscientific benchmark: We checkout the Linux kernel repository at a randomly
selected
commit,
run Boyter’s scc line-counting tool, and checkout
master
again. We do this both with gitsplorer
and with ye olde git
checkout
. The results speak for themselves:
git checkout: 62 seconds
gitsplorer: 567 seconds
The gitsplorer
version is also remarkable for spending all its time using 100%
of a CPU, which the git
version does not. (It uses around 90% of a CPU while
doing the checkouts, then all of my CPUs while counting lines. The Python FUSE
filesystem is single-threaded, so beyond Python being slow it must also be a
point of congestion for the line counting.) I did some basic profiling of this
with the wonderful profiler Austin, and saw
that the Python process spends most of its time reading Git blobs. I think, but
did not verify, that this is because libgit2
decompresses the contents of the
blobs on every such call, while most of the reads we make are in the FUSE
getattr
call where we are only interested in metadata about the blob. I made
no attempts to optimize any of this.
So, friends, if you’ve ever wished git checkout
was read-only and 10 times
slower than it is, today is your lucky day.
It’s hot tonight, and this is going to be mostly the whiskey talking while I
wait for it to get cool enough to sleep. I’ve killed the mosquitos I’ve seen,
however, so until then I have only Haskell to keep me company.
This morning, tef tweeted about
monads,
which sent the Haskell pack his way with
barks of not getting it. Just now, pinboard was reminded of some guy’s rage
against
Esperanto,
from back in the 90’s when the web was fun and mostly devoted
to things like explaining how "The Downward Spiral" is a concept album or
destroying Unix instead of each other’s mental health.
For the Haskell pack, I did a PhD in the type of math that necessitates a lot of
category theory, and I have looked at your use of category theory, and judged it
to be unnecessary and pretentious and mainly focused on making you look smart
while being entirely trivial. But this is not that kind of blog post, one that
gets too tangled up in whether category theory is useful to get to the point.
(If nothing else,
Pijul
proves that category theory is useful.) We’re here to discuss how Haskell as a
whole is nonsense if you’re not an academic. Our claim is that Haskell is a
useless language for writing software that has users.
Our point is simple, and focused on IO. We propose that you can measure how
user-facing a program or language is by measuring how much of its time it spends
or worries about doing IO. That is, after all, the medium through which anyone
who is not a program’s author (of which there may be many) will interact with
the program. The time spent doing IO can be on the command line, via a GUI, over
a network, or wherever; but to be a serious contender for user-facing programs,
a language has to make IO be easy.
C is a terrible language for most new things today. Anyone writing new software
in C, that they expect to be used by other than thouroughly vetted people, needs
to be able to explain why they’ve chosen C. At the same time, a lot of us are
still exposed to C through the BSD or Linux kernels and syscalls, the undying
popularity of K&R, random software on the internet, or other vectors. The
culture around C invented the modern language textbook, K&R, and the modern
user-facing program, "Hello world", both of which spend most or all of their
time dealing with IO to talk to you or other users.
I claim making IO as simple as possible, which C does for all its faults, to do
is analogous to trying to making it as simple as possible for other people to
talk to you in designed languages as you can.. Esperanto shows you can fail
at that goal, if you even had it, for it favors sounds native to European
languages above others. Likewise, Haskell shows you can fail at the goal of
making IO easy, if you even had it, for it does not.
Haskell is a purely functional, lazily evaluated language, with a type system.
Like tef explains, that is great, until you run into IO. Up until that point,
you could rearrange computations in any order you liked, if they needed to be
done at all. As soon as you need to do IO, though, you need something to happen
before another thing, which makes you very unhappy if you’re Haskell. It in fact
makes you so unhappy that you’ll drag the entire lost-at-sea community of
category theorists into the orbit of your language just so you can have an
abstraction for doing IO that fits into your model of the world. This
abstraction, monoids, then comes with the added benefit of being abstract enough
that all of your programmers can spend their time explaining it to each other
instead of writing programs that use the abstraction to do IO, and therefore
deal with any actual users.
Haskell is where programmers go to not have users.
Here’s something I thought of when I couldn’t sleep last night.
The curvature tensor of a Kähler metric can be viewed as a Hermitian form on
\(\bigwedge^{1,1} T_X^*\) by mapping \(\operatorname{End} T_X \to \bigwedge^{1,1}
T_X^*\) via the metric.
If we’re on a compact Kähler manifold with zero first Chern class, then for each
Kähler class \(\omega\) and \((1,1)\)-classes \(u, v\), we can pick the Ricci-flat
metric in \(\omega\) and the harmonic representatives of \(u, v\). If \(R\) is the
curvature tensor of the metric, viewed as a Hermitian form, we can then set
\[
b(u,v)(\omega) := \int_X R(u, v) \, dV_{\omega}.
\]
This defines a smooth bilinear form \(b\) on the tangent space of the Kähler cone
of \(X\).
Besides being fun times, can we say anything interesting about \(b\)? For example,
what is its norm with respect to the Riemannian metric on the Kähler cone, or
its trace with respect to that metric? Can we integrate it over some subset of
the cone?
I haven’t made a lot of progress on my projects. I did create a Scaleway VM
and shoved an OAI harvester on there that’s happily downloading the arXiv’s
backlog of metadata. I can also parse the XML it fetches, and have some ideas
about how I’m going to store it.
This lack of progress mostly comes from me being nerd-sniped into thinking
about bounded queues under load. My
old work project
wanted to use a FIFO queue to hold its requests. That is a bad idea, because
FIFO queues perform very poorly under load, as I
went a little overboard
in demonstrating.
Funnily enough, that very simple thing is one of my most popular Github
projects ever.
Suppose we have an HTTP service. The behaviour of service depends on some
configuration that may change at runtime; it may reload a static configuration
file on SIGHUP
, need to react to changes in its service discovery mechanism,
or have A/B test state or features toggled on and off.
In Go, a naive way of handling this is by writing our configuration state to a
struct and updating it in background goroutines:
type State struct {
frobinate bool
}
var state State{}
func handler(w http.ResponseWriter, r *http.Request) {
if s.frobinate {
fmt.Fprintln(w, "Great success")
} else {
fmt.Fprintln(w, "Great non-success")
}
}
This naive approach has at least two problems:
-
The state may change while we’re processing a request, causing us to process
part of the request with one state, and another part with another. This isn’t a
big deal in our example, but becomes more of a problem as the time needed to
handle a request increases.
-
There are no synchronization primitives in play, so updating the state has data
race conditions.
Check out the working example in
this
commit
to see the first problem in action.
One way to resolve these problems is to add a mutex and to pass copies of the
state to the request handlers:
type State struct {
frobinate bool
*sync.Mutex
}
// Copy may be arbitrarily complicated if State contains slices, maps,
// pointers, or other structs.
func (s *State) Copy() State {
s.Lock()
defer s.Unlock()
return *s
}
var state &State{}
func handler(w http.ResponseWriter, r *http.Request) {
s := state.Copy()
if s.frobinate {
fmt.Fprintln(w, "Great success")
} else {
fmt.Fprintln(w, "Great non-success")
}
}
The background goroutine that updates the state then either does so through a
dedicated method that locks/unlocks the mutex, or does the locking itself.
While this works fine, it relies on global state and uses none of Go’s built-in
concurrency features. We were promised a brave new world, and encouraged to
"share memory by communicating". This points the way to another solution to our
two problems that leverages Go’s primitives better:
type State struct {
frobinate bool
}
// Copy may be arbitrarily complicated if State contains slices, maps,
// pointers, or other structs.
func (s State) Copy() State {
return s
}
var stateCh chan State
var toggle chan struct{}
func stateManager() {
state := State{}
for {
select {
case stateCh <- state.Copy():
case <-toggle:
state.frobinate = !state.frobinate
}
}
}
func handler(w http.ResponseWriter, r *http.Request) {
s := <-stateCh
if s.frobinate {
fmt.Fprintln(w, "Great success")
} else {
fmt.Fprintln(w, "Great non-success")
}
}
A complete working example is
here.
Note that the working example doesn’t rely
on any global state to pass state to the handler functions, using closures
instead. Modifying the state is also only possible within the stateManager
function. The working example could also be extended to have middleware copy the
state to each request processing function instead of the ad-hoc way done there.
It is still up to the developers to ensure the Copy
method doesn’t pass mutable
state around, but they no longer need to deal with mutexes and locks themselves.
This also means that any future additions to the program that use the same
pattern won’t need to worry about those locks either.
There are no silver bullets in heavily concurrent systems, but in Go we can
choose to not deal with some of the footguns we would need to handle in C, Java
or other similar languages.