A new, worse, proof that the holomorphic sectional curvature dominates the scalar curvature

If \(R\) is the curvature tensor of a Kähler metric, we define its holomorphic sectional curvature to be \[ H(x) = R(x, \bar x, x, \bar x) / |x|^4 \] for nonzero \(x\). If \( (e_1, \ldots, e_n) \) is a frame of the tangent bundle that is orthonormal at the center of a coordinate system, we also define the scalar curvature of \( R \) at the center to be \[ s = \sum_{j,k=1}^n R(e_j, \bar e_j, e_k, \bar e_k). \] A classical result of Berger is that the sign of the holomorphic sectional curvature determines the sign of the scalar curvature. More precisely he proved that \[ \int_{S^{2n-1}} H(x) \, d\sigma(x) = \frac{1}{\binom{n+1}{2}} s, \] where \( d\sigma \) is the Lebesgue measure on the sphere normalized to have volume one. Berger’s proof is by a straightforward calculation that proceeds by expanding \( H(x) \) in terms of a local frame and then integrating each polynomial factor over the sphere. There is also a slightly less painful proof that uses representation theory to compute the integral.

We are going to explain a new, worse, proof of the fact that the holomorphic sectional curvature determines the sign of the scalar curvature. The classical proof of Berger’s result is not very illuminating, as it seems to hinge on the fact that the integrals of \(|z_j|^4\) and \(|z_j|^2 |z_k|^2\) over the unit sphere agree up to a factor of two. The representation-theoretic calculation of the integral explains why this happens; the integral is invariant under such a large group action that the whole thing has to be the trace, which evaluates to the scalar curvature. Our new proof offers no explanations, only brute force calculations that work out, somehow. If we conspire to write \( R(x) := R(x, \bar x, x, \bar x) \) then we will show that \[ \sum_{j=1}^n \hat R(e_j) + \hat R\biggl( \sum_{j=1}^n e_j \biggr) = 2 s \] for a different curvature tensor \( \hat R \) whose holomorphic sectional curvature has the same sign as that of \( R \). Since \( \hat R(x) \) has the same sign as \( H(x) \), we conclude that the sign of \( H \) determines that of \( s \).

We’re going to work at the center of a normal coordinate system, so we will pick an orthonormal basis \((e_1, \ldots, e_n)\) of the tangent space at that point. If \( (x_1, \ldots, x_n) \) are local coordinates in \( \mathbf C^n \) the torus \( T^n := S^1 \times \cdots \times S^1 \) acts on \( \mathbf C^n \) by \[ \psi(x) = ( e^{it_1} x_1, \ldots, e^{it_n} x_n ). \] Given a curvature tensor \( R \) of a Kähler metric, we then define \[ \hat R(x, \bar y, z, \bar w) = \frac{1}{(2\pi)^n} \int_{T^n} \psi^* R(x, \bar y, z, \bar w) \, dt_1 \cdots dt_n. \] We note some properties of \( \hat R \):

  1. \( \hat R_{jklm} = R_{jklm} \) if \( j=k \) and \( l=m \) or \( j=m \) and \( k=l \) and is zero otherwise. This is clear by evaluating the integral.

  2. \( \hat R \) has the symmetries of a Kähler curvature tensor, by 1.

  3. \( \hat{\hat R} = \hat R \), by 1.

  4. If \( R(x) > 0 \) for all \( x \) then \( \hat R(x) > 0 \) for all \( x \), because then the integrand is positive for all \( x \) and \( (t_1, \ldots, t_n) \).

  5. The scalar curvatures of \( R \) and \( \hat R \) are equal, because \( s = \sum_{j,k=1}^n R_{jjkk} = \sum_{j,k=1}^n \hat R_{jjkk} \).

We can explain what \( \hat R \) is: Recall that a Hermitian curvature tensor on \( \mathbf C^n \) can be viewed as a Hermitian form on \( \mathbf C^n \otimes \mathbf C^n \), and that such a tensor has the symmetries of a Kähler curvature tensor if and only if it is the pullback of a Hermitian form on \( \operatorname{Sym}^2 \mathbf C^n \). Then \( \hat R \) is just the projection of \( R \) onto the subspace of diagonal Hermitian forms on \( \operatorname{Sym}^2 \mathbf C^n \). This way however it’s not so clear that \( \hat R \) also has positive holomorphic sectional curvature if \( R \) does.

We now note that \[ \hat R\biggl( \sum_{j=1}^n e_j \biggr) = \sum_{j=1}^n R_{jjjj} + 4 \sum_{j\not=k} R_{jjkk} \] and so \[ \hat R\biggl( \sum_{j=1}^n e_j \biggr) + \sum_{j=1}^n \hat R(e_j) = 2s. \] If the holomorphic sectional curvature of \( R \) is positive, then so is the one of \( \hat R \), and thus \( s > 0 \).

Local distributed rate limiting

Suppose our job is to design a rate limiting system. It could be embedded in a service or be a service of its own. Given some details about a request from which we build a source identifier (an IP address is popular and we may assume that in what follows) our system should check how many requests have been made from that identifier in some interval of time and allow or deny the request based on that information.

I’ve seen a handful of such systems implemented at work. More precisely, I’ve read outage reports on a handful of such systems. A common theme in their design seems to be that the system keeps the request count per identifier in some central database, like Cassandra or MySQL. I can only assume this is done to ensure the system is in some way fair or correct, where those terms are taken to mean that as far as we know the rate limits we impose are global across all requests to a given availability zone or data center or unit of compute area.

I am but a simple country developer, and am probably missing something obvious, but it has never been clear to me why we care about fairness or correctness in these systems. Those come at the price of coordination, which is evil and we hate it. But we should also be able to do without it. Each node in our system can keep its own count of requests per identifier, and make its decisions based on that local information only.

There are a couple of reasons we can do this. The first is that requests are presumably load-balanced across our nodes. That load balancing is either done in as uniform a way as possible, or in a "sticky" way where each request identifier gets sent to the same node (or small group of nodes, then again as uniformly as possible). In either case each node sees either zero requests per identifier or a representative sample of them. So having the nodes make local decisions about identifiers will in aggregate reflect the global behavior of the system.

The second is that source behavior is usually binary. A given source is either well-behaved at a given moment, or completely batshit insane. Either behavior is perfectly visible at the local level and needs no coordination to figure out. If we need correctness for later accounting or observability, the nodes can report their local counts to a central authority on their own time.

So the next time you need distributed rate limiting, think locally. Reach for an in-memory SQLite database, or something horrifying like a ring buffer you roll yourself and definitely test no part of before shipping. Have fun with it. But don’t coordinate when you don’t have to.

Estimating the norm of the exterior product

Let \(V\) be a real finite-dimensional vector space, and let \(\bigwedge^\bullet V\) be the exterior algebra of \(V\). An inner product on \(V\) induces inner products on each component \(\bigwedge^k V\) of the exterior algebra. Recall that we have the exterior product \[ \bigwedge{}^{p} V \times \bigwedge{}^{q} V \to \bigwedge{}^{p+q} V, \quad (u, v) \mapsto u \wedge v. \] Since all the spaces here are finite-dimensional, there exist constants \(C = C(p,q,n)\) such that \[ |u \wedge v| \leq C |u| \, |v| \] for all \(u \in \bigwedge{}^ p V\) and \(v \in \bigwedge{}^ q V\). Can we estimate these constants?

The answer apparently has applications to physics. I don’t understand the connection to physics or the physical motivation, but in a 1961 paper the physicist Yang Chen-Ning conjectured that for even-dimensional spaces the optimal bound is achieved on powers of the standard symplectic form. As

\[ \frac{\omega^p}{p!} \wedge \frac{\omega^q}{q!} = \binom{p+q}p \frac{\omega^{p+q}}{(p+q)!}, \] where \(\omega = \sum_{i=1}^n dx_{2i-1} \wedge dx_{2i}\) and \(|\omega^p / p!|^2 = \binom np\), the conjecture is that \[ C(2p,2q,2n) = \tbinom{p+q}p\sqrt{\frac{\binom{n}{p+q}}{\binom np \binom nq}}. \] Yang claims to prove this for the case of \(p = 1\) in his 1961 paper, though I don’t understand his proof. A preprint from 2014 claims to prove more cases of the conjecture; again the details of the proof defeat me.

There are two ways of getting rather brutal estimates for these constants. Let \((v_1, \ldots, v_n)\) be an orthonormal basis of \(V\). Then \( (v_I) \) where \(I \subset \{1, \ldots, n\}\) with \(|I| = p\) is an orthonormal basis of \(\bigwedge^p V\), and similar for \( (v_J) \) with \(|J| = q\). If \(u\) and \(v\) are elements of these bases, then \(u \wedge v\) is either 0 or has norm 1.

For our first attempt, let \(x = \sum_{|I|=p} x_I v_I\) and \(y = \sum_{|J|=q} y_J v_J\) be elements of \(\bigwedge^p V\) and \(\bigwedge^q V\). Then \[ |x \wedge y| = \biggl| \sum_{|I|=p, |J|=q} x_I y_J v_I \wedge v_J \biggr| \leq \sum_{|I|=p, |J|=q} |x_I y_J| = \sum_{|I|=p} |x_I| \cdot \sum_{|J|=q} |y_J|. \] By taking the inner product of \(x\) and the vector whose elements are the signs of each \(x_I\) and applying Cauchy—​Schwarz, we get \[ \sum_{|I|=p} |x_I| \leq \sqrt{\tbinom np} |x|. \] This gives \[ |x \wedge y| \leq \sqrt{\tbinom np \tbinom nq} |x| |y|. \] Our first estimate is thus \(C(p,q,n) \leq \sqrt{\binom np \binom nq}\). This is likely to be a bad estimate, because when applying the triangle inequality above we assigned equal weight to all expressions \(|v_I \wedge v_J|\), even though many of them were zero. We also applied the Cauchy—​Schwarz inequality on top of the triangle one. The conditions for those inequalities to be exact are orthogonal — the triangle one is exact when all the vectors lie on a straight line, while Cauchy—​Schwarz is exact when all the vectors are orthogonal — which should yield suboptimal bounds.

For our second attempt, we consider the bilinear map \[ b : \bigwedge{}^ p V \otimes \bigwedge{}^ q V \to \bigwedge{}^ {p+q} V, \quad u \otimes v \mapsto u \wedge v. \] Its operator norms satisfy \(\|b\|^2 \leq |b|^2\), where \[ \| b \|^2 = \sup_{x,y} \frac{ |b(x,y)|^2 }{ |x|^2 |y|^2 } \] and where \(|b|\) is the Hilbert—​Schmidt norm. We can calculate that norm by counting the number of non-zero entries in the matrix for \(b\).

Let then \(I = (i_1, \ldots, i_p)\) be a multiindex. We can choose \(\binom{n-p}{q}\) indices \(J = (j_1, \ldots, j_q)\) such that the corresponding basis elements \(v_I\) and \(v_J\) satisfy \(b(v_I \otimes v_J) = v_I \wedge v_J \not= 0\). As there are \(\binom np\) ways of picking \(I\), we conclude that the Hilbert—​Schmidt norm of \(b\) is \[ |b|^2 = \tbinom np \tbinom {n-p}q = \tbinom n{n-p} \tbinom {n-p}q. \] We then get a slightly better estimate \[ C(p,q,n) \leq \sqrt{\tbinom n{n-p} \tbinom {n-p}q} \leq \sqrt{\tbinom np \tbinom nq} \] with equality in the second place if and only if \(p = 0\) or \(p = n\).

A fun fact is that these estimates are arbitrarily bad. When \(p + q = n\), the Hodge star operator gives an isometry \(\bigwedge^q V \cong \bigwedge^p V\) and the Cauchy—​Schwarz inequality gives \[ |u \wedge v| \leq |u| |v|, \] so \(C(p,n-p,n) = 1\), which agrees with Yang’s conjectured value when \(p\) and \(n\) are even. Our best estimate in this case is \[ C(2p,2n-2p,2n) \leq \sqrt{\tbinom {2n-2p}{2p}}, \] which goes to infinity as \(n\) grows.

I think this is a very fun instance of a problem that’s easy to state and whose solution is somewhat difficult. The simple attempts at a solution run into basic problems with the optimal conditions for different inequalities on the one hand, and the fact that the things we can easily calculate are not the things we’re interested in on the other hand.

Emacs is my favorite Emacs package

In the category of "people whose work ethic and output I can only envy from afar while failing to do anything of value", Protesilaos Stavrou takes the lead with his video Emacs is my "favourite Emacs package" in which he demonstrates how bananas well he has come to understand Emacs in the single year (!!!) he has used it.

Anscombe's quartet and monitoring

Anscombe’s quartet is a collection of four data sets of points in the plane. When graphed the data sets are obviously very different, but they have identical sets of descriptive statistics. Newer variations on the same theme include the delightful Datasaurus dozen, which includes animations of deformations from one step into another where the descriptive statistics are kept identical during the animation.

The datasaurus dozen

I’ve seen Anscombe’s quartet cited in discussions around monitoring and observability to assert that the average and standard deviation (two of the statistical tools used in the quartet) are not useful for monitoring and alerting; instead one should use percentiles. I agree with the conclusion but think it does not follow from the premise, and would like to offer some thoughts on the theme from the perspective of a reformed geometer.

To do so, I’d like to work out some examples involving the average and standard deviation. The data sets above involve collections of points in the plane but I’d like us to discuss data modeled on timeseries instead. The difference isn’t important, everything we say applies to both theaters, but the geometry we want to point out is easier to see in the latter.

Setup

We’ll model a portion of a timeseries as a tuple of \(n\) real numbers \((x_1, \ldots, x_n)\). We can think of these as \(n\) measurements recorded at one-second (or minute, hour, …​) intervals. The set of all possible such timeseries is familiar; it is the real vector space \(\mathbb{R}^n\) of dimension \(n\). (Timeseries can be added and subtracted, there is a zero-timeseries that serves as an identity, and we can multiply a timeseries by a real number.)

The main characters of our story are the average and standard deviation. If \(x = (x_1, \ldots, x_n)\) is a timeseries, its average is \[ \mu(x) = (x_1 + \cdots + x_n) / n \] and its standard deviation is defined by the scary formula[1] \[ \sigma(x) = \sqrt{ ( (x_1 - \mu(x) )^2 + \cdots + (x_n - \mu(x) )^2 ) / n }. \]

We can now look at some examples. In every one of these, we’ll fix the average and standard deviation to some values and look at the collection of timeseries that have those values. The first couple of examples are just warm-up.

n = 1

We first look at \(n = 1\), that is, timeseries that just have one value \(x = (x_1)\). For such a timeseries we have \(\mu(x) = x_1\) and \(\sigma(x) = 0\) so each timeseries is uniquely determined by its average and they all have the same standard deviation.

n = 2

The next case is \(n = 2\), of timeseries with two values \(x = (x_1, x_2)\). This is slightly more interesting; we have the average \(\mu(x) = (x_1 + x_2) / 2\) and the standard deviation \[ \begin{align} \sigma(x) &= \sqrt{( (x_1 - \mu(x) )^2 + (x_2 - \mu(x) )^2 )/2} \\ &= \sqrt{((x_1 - x_2)^2 + (x_2 - x_1)^2)/8} = |x_1 - x_2|/2. \end{align} \] The second equality here is by substituting the definition of \(\mu(x)\).

This is more interesting because different timeseries can have different averages and standard deviations. However, I claim that if we fix both the values of the average and standard deviations, say to \(\mu\) and \(\sigma\), there is at most one timeseries that has those values.

To see this, observe that the formula for the average implies that \(x_2 = 2 \mu - x_1\), so if we fix the value of the average the value of \(x_2\) is determined by the choice of \(x_1\). If we substitute this into the formula for the standard deviation we get \[ \sigma(x) = |x_1 - 2 \mu + x_1|/2 = |x_1 - \mu|. \] We can solve this equation by splitting into cases where \(x_1 > \mu\) and \(x_1 < \mu\). In the former we get \(x_1 = \sigma + \mu\) and in the latter we get \(x_1 = \mu - \sigma\). Notice that in both cases, the value of \(x_1\) is determined by the values of the average and standard deviations, and the value of \(x_2\) is determined by that of \(x_1\) and the average, so both \(x_1\) and \(x_2\) are determined once we fix the others.

n = 3

We now get into the first case that has a hint of the general picture. Our timeseries have three points, \(x = (x_1, x_2, x_3)\). We could write down equations like in the previous case and calculate, but it’s worth it to take a step back and think about what’s going on.

Let’s say we have fixed the value of the average of our timeseries to a constant \(\mu\). The formula for the standard deviation of our series is then \[ \sigma(x) = \sqrt{((x_1 - \mu)^2 + (x_2 - \mu)^2 + (x_3 - \mu)^2)/2}. \] If we form the point \((\mu, \mu, \mu)\), we see that \[ \sigma(x) = \|(x_1, x_2, x_3) - (\mu, \mu, \mu)\|/\sqrt{2} \] is just a multiple of the Euclidean distance between our timeseries and the fixed point \((\mu, \mu, \mu)\). If we also fix \(\sigma(x) = \sigma\), the set of points that satisfy this equation is \[ S := \{ (x_1, x_2, x_3) \mid \|(x_1, x_2, x_3) - (\mu, \mu, \mu)\| = \sqrt{2} \sigma \}. \] This should look familiar, it is the sphere whose center is \((\mu, \mu, \mu)\) and whose radius is \(\sqrt{2}\sigma\).

Similarly, the set of timeseries whose average is equal to \(\mu\) is \[ P := \{ (x_1, x_2, x_3) \mid (x_1 + x_2 + x_3)/2 = \mu \}. \] For those who know a little linear algebra, we can write the condition as being that \((x_1, x_2, x_3) \cdot (1, 1, 1) = 2 \mu\), that is, that the inner product with \((1, 1, 1)\) must be equal to \(2 \mu\). The set of points that satisfies such a condition is a plane; a flat two-dimensional space.

We’re interested in the set of points where both of these conditions are true, which is the intersection \(S \cap P\). We could work out what that is by calculation, but it’s easiest to visualize it. The intersection of a sphere and a plane in three dimensions is either empty (when they don’t meet), a single point (when the plane is tangent to the sphere), or a circle that lies on the sphere.

In three dimensions, if we pick our values for the average and standard deviations right, there is thus a whole circle of timeseries that have that average and standard deviation. They form a kind of Anscombe set.

Higher dimensions

In higher dimensions, basically the same thing happens as in dimension 3. The set of timeseries that have a fixed average is a hyperplane and the set of timeseries that have a fixed standard deviation and average is a sphere. Their intersection, when it is not empty or a single point, is not as easily described but it is a subspace of dimension \(n-2\). As \(n\) grows higher, this space grows as well, and in general the space of timeseries that have a fixed average and standard deviation is enormous.

Other prescriptive statistics

The final point I’d like to make, and one you’ll have to take my word for to some extent, is that there is nothing special about the average or standard deviation here. Very similar things will happen for every collection of statistical tools we pick.

Every statistical tool can be viewed as a function that takes a timeseries (or perhaps a couple of timeseries, which complicates things a little but not a lot) and returns a real number. These functions are very well behaved (continuous, differentiable, and so on).

Given a collection of such functions \(f_1, \ldots, f_k\) on the space of timeseries of \(n\) points, we can ask whether there is an Anscombe set of fixed values of these functions? That is, if we fix values \(y_1, \ldots, y_k\), can we say anything about the set \(\{x = (x_1, \ldots, x_n) \mid f_1(x) = y_1, \ldots, f_k(x) = y_k \}\)?

With some hand waving, we can actually say this: Fixing the value of a single tool \(f_j\) creates a hypersurface \(X_j = \{x \mid f_j(x) = y_j\}\) inside the space of timeseries, that is, a subspace of dimension one less than the ambient space. All timeseries in \(X_j\) will have the same value with respect to this tool. If the intersection of all of these \(X_1 \cap \cdots \cap X_k\) is not empty, it will generally be a subspace of dimension \(n - k\). If \(n\) is much bigger than \(k\), the dimension of this space, the space of timeseries that all have the same statistical values with respect to our tools, is enormous.

Making the hand waving precise and figuring out when this intersection will be non-empty is the domain of algebraic geometry over the real numbers. Saying anything at all about that is beyond our scope here.

Outro

I hope I have convinced you that the statement "the average and standard deviation are useless for monitoring because of Anscombe’s quartet" is not true. The moral of Anscombe’s quartet, which boils down to intersection theory in algebraic geometry, is that the space of data that fits any given collection of statistical instruments (including any collection of percentiles) is gigantic and we have to look at the data to make sense of it.

In general, percentiles are more valuable for analyzing performance or traffic data than the average and standard deviation. The reason is not Anscombe’s quartet, the robustness of percentiles, or anything to do with normal distributions, but that the average and standard deviation are fundamentally tools for answering the questions "what would this be if it were constant?" and "how far away is this from being constant?".[2] For performance and traffic analysis, we don’t care about either of those questions, but about peaks in demand and how we can handle them. We should pick tools that can help with that, but other tools have their place.


1. Some people divide by \(n-1\) in the formula for the standard deviation. I’m sure they have their reasons, but the factor of \(n\) is more natural for geometers. The reason involves interpreting timeseries as functions and is a bit of a detour from what we really care about.
2. Exactly how involves some linear algebra and this post was already long. If people care, I can write another one explaining how.

How Silicon Valley will solve the trolley problem

The trolley problem asks how to decide between the lives of people in two groups. At the moment, it comes up in our industry in discussions around self-driving cars: Suppose a car gets into a situation where it must risk injuring either its passengers or pedestrians; which ones should it prioritize saving?

Always choosing one group leads to suboptimal outcomes. If we always save the pedestrians, they may perform attacks on car riders by willfully stepping into traffic. If we always save the passengers, we may run over a pedestrian whose life happens to be of greater worth than that of the passengers.

The crux of the trolley problem is how we should make the choice of what lives to save. As with all hard problems without clear metrics for success, it’s best to solve this by having the participants decide this for themselves.

Every time we view a website, our friendly ad networks must decide what ads we should see. This is done by having our user profile put up on auction for prospective advertisers. During a handful of milliseconds, participants may inspect our profile and place a bid for our attention according to what they see.

The same technology could be trivially repurposed for deciding the trolley problem in the context of self-driving cars.

Assume the identity of every participant in the trolley scenario is known. Practically, we know the identity of the passengers; that of the pedestrians could be known if their phones broadcast a special short-range identification signal. An incentive for broadcasting such a signal could be that we would have to assume that a person without one were one of no means.

Given this information, a car about to be involved in a collision could take a few milliseconds to send the identities of the people involved to an auction service. Participants who had the foresight of purchasing accident insurance would have agents bidding on their behalf. The winner of the bid would be safe from the forthcoming accident, and their insurance would pay some sum to the car manufacturer post-hoc as a reward for conforming to the rules.

The greater a person’s individual wealth, the better insurance coverage and latencies they could purchase, and the more accidents they could expect to survive. This aligns nicely with the values of societies like the United States, where the worth of a person’s life is proportional to their wealth.

As this system lets self-driving car manufacturers off the hook for any decisions taken, and would need coordinated ethical action on the behalf of software engineers to not be implemented, we expect to see this system in action in the world in short order.

Infosec: A board game

I’m pleased to announce the release of my new card-drawing game Infosec.

The rules of the game are simple. It is for two or more players. The player with the fewest friends is the Infosec Expert; the other players are the Coworkers.

To start, the Infosec Expert deals three cards face down from a standard deck of cards. The Coworker on the Infosec Expert’s right hand should draw one card.

"Which card?" the Coworker may ask.

"This is a simple phishing exercise," the Infosec Expert should reply. "Just pick a card."

"But they all look the same," the Coworker may object.

"Draw one. And get it right."

This exchange should go on in increasingly hostile tones until the Coworker agrees to draw a card. It will have been the wrong card. The Infosec Expert should inform the Coworker:

"You got phished. You moron. You fucking idiot. You’re such a goddamn waste of space and time. How could you have gotten this so wrong? Were you even trying? Answer me. What the fuck was that?"

Feel free to ad-lib along these lines, or draw as many cards as you want from the accompanying Admonishment Deck (expansion packs available). Include ad-hominem attacks and use as many personal details as you know. Interrupt the Coworker if they try to reply. Don’t hold back.

Once the Coworker is silent, the Infosec Expert should collect the cards, deal new ones, and proceed to the next Coworker in line.

The game ends with the Infosec Expert’s victory when all the Coworkers have left.

Backing up data like the adult I supposedly am

Like so many things I’m supposed to do but don’t — getting exercise, eating right, sleeping well, standing up for women and minorities in public spaces — backing up my data has always been something I’ve half-assed at best.

I’ve lugged around an external hard drive with a few hundred gigabytes of data for the last 10 years, and made backups to it once every three or four years or so. Every time I’ve tried restoring anything from those backups I’ve regretted it, because of course I just bought the drive, plugged it in and copied stuff to it, so it is a FAT32 drive while I have mostly had EXT4 filesystems, which means all my file permissions get lost during the process.

I’ve written shameful little shell scripts to set file permissions to 0644 and directory permissions to 0755, recursively, many many times.

Part of my problem was that I both know just enough rsync to be dangerous and have a credit card so I can provision cloud VMs, so forever just around the corner was my perfect backup solution that I’d write myself and maintain and actually do instead of dealing with whatever I had going on in my life. I’ve come to accept that this will never happen, or perhaps more definitively, that I’d rather cut myself than write and maintain another piece of ad-hoc software for myself.

Luckily I recently found two things that have solved this whole problem for me: borg and rsync.net.

Borg is backup software. It compresses and deduplicates data at the block level, and strongly encourages (but does not force) you to encrypt data before backing it up. It is everything I’d want from my half-assed rsync and shell script abomination.

I read its documentation a couple of times and was impressed. I then set about comparing different VM hosts to see which one would give me the cheapest block storage option, when the result of some random google search led me to rsync.net. They are a company that stores backups, pretty cheaply, and even more cheaply if you use borg to take them. I guess they just really love borg and want us to love it too.

I signed up for their cheapest plan, which starts at 100GB stored for $18 per year. They have no network in- or egress costs, and the storage amount can be adjusted at any time. Once my account had been activated, I did a little password reset dance, and uploaded a public SSH key.

I wanted to back up my $HOME directory, so after installing borg I ran:

export BORG_REMOTE_PATH="borg1"
borg init --encryption repokey-blake2 UID@ch-s011.rsync.net:home

This created a remote borg repository called "home" on rsync.net’s servers. The environment variable is so we use a more recent version of borg on the remote server (version 1.1.11 at the time of writing), as the default version is rather old (version 0.29.0).

When choosing what encryption method to use, one can choose between a "repokey" or a "keyfile". They both create a private key locked with a passphrase; the difference is that with "repokey" the key is stored in the borg repo, while with "keyfile" it is stored outside of it. This boils down to whether we think a passphrase is enough security for our data, or whether we think having a secret keyfile is necessary. I figured my password manager could create a strong enough passphrase for my needs, and I didn’t want to think about losing the keyfile, so I chose "repokey-blake2".

To create my first backup, I ran

borg create --exclude "$HOME/.cache" UID@ch-s011.rsync.net:home::backup-1 "$HOME"

which created the archive "backup-1" in my "home" borg repository. I didn’t change the compression algorithm from the default one.

By default borg compresses data with lz4. It can use other compression methods (xz, zlib, zstd). I compared their compression ratios on some binary files I had and found no difference between them. I think this is because the large binary files I have are mostly audio and video files in lossy formats, which don’t seem to benefit very much from further compression. I have a lot of text files as well, but text takes up so little relative space on today’s hardware that it makes no sense to spend CPU cycles on compressing it better than lz4 does.

This backup command hummed along for a good while, and through a couple of reboot cycles. Doing a second backup right after it finished (or the day after) took a lot less time because of the deduplication:

borg create --exclude "$HOME/.cache" UID@ch-s011.rsync.net:home::backup-2 "$HOME"

Restoring from backup is also easy:

borg extract UID@ch-s011.rsync.net:home::backup-2

I set this up to run as a daily timed systemd service at noon (very easy on NixOS, which every Linux user should be using unless they hate themselves), and will never, ever think about this again. For a handful of bucks a year, that is a good deal.

Review of home manager

A common theme among the people who fall in love with Nix (or its cousin NixOS) is that they want it to manage everything for them. Finding out the limits of that capability and where it breaks down is part of each’s journey.

An obvious enough limit to Nix’s reach is secret management. It can easily handle public keys, SSH or cryptographic ones, but the private keys cannot be managed by its store as it is readable by all.

The promise of Nix is that we will never break our system by updating it. We can always roll back to a previous working version, once we have a working version at all. This is in contrast to, say, every other Linux distribution. I have borked Ubuntu, Arch, Fedora and others during system updates. Often the safest way to update the system is to back up /home and reinstall the OS.

From somewhere comes the idea that a user on a Linux system should be able to install their own programs and manage their own system services. (See Chris Wellons for where you can run with this idea to if your sysadmin gives you a C compiler.) This seems odd from a historical perspective. In a true multi-user system, the system administrators normally do not want users to be able to install arbitrary software or run services. On a modern "multi"-user system, where there is a only single user, avoiding system packages and services seems like some kind of theater.

Yet we do it anyway. An argument I sometimes make to myself is that this is a cleaner separation between what I need the system to do versus what I do on it. I may want to be able to run traceroute as root, but I don’t care about root being able to run the Go compiler.

NixOS has facilities to enforce this separation. It happily creates users and their home directories, and can fill in some of the bits that go there, like public SSH keys. It can install packages for specific users, and allow them to define systemd services. It will not manage the configuration of specific user packages (like dotfiles) without some coercing. One can presumably create custom derivations of, say, ZSH with all the configuration one wants, but who has the time?

Home manager wants to fill this gap and bring the power of Nix to user environment and configuration management. It lets individual users say what packages they want installed; what services they want run; and what configuration files should go where. On the surface it seems like something I should love, but after using it for a month I wrote it out of my system and now use plain NixOS.

I used Home manager for three things:

  1. Installing packages for my user.

  2. Scheduling services for my user.

  3. Installing configuration files for my software.

The first one was never a big attraction. Home manager lets us define a list of packages in home.packages of packages to install. If we control the system configuration, we can acheive the same by defining those packages in users.users.$username.packages.

If a user controls the system configuration (directly or through a sysadmin), they can also define their own user services via systemd.user. Home manager’s selling point is that it comes with a large list of already defined services that we can enable with a boolean flag, instead of having to write our own service configuration. This is admittedly nice. In the end, I found that learning the idiosyncrazies of each home manager service definition was a less useful use of my time than learning how to define NixOS systemd services once and for all. The latter is after all where the former end up.

As a long-time sufferer of dotfile management, I had high hopes for the third point. And indeed, home manager will manage dotfiles just fine. It can do this in two modes: it can generate a config file from various options we fill out if someone has written a home manager module for the program we’re trying to configure, or it can plomp a file on the system verbatim from a source. I used the latter, as I didn’t feel like learning a configuration language to be able to partially configure program dotfiles was a good idea.

This works well, until we want to change anything in a dotfile. This experiment with home manager coincided with a regular low point in my life in which I try to use emacs. This comes with quite a lot of .emacs changes as I use Lisp for the only thing it’s ever been good for; configuring a text editor in the most complicated way imaginable. Now, the dotfiles that home manager (or Nix) puts on our systems are read-only, so every change would involve changing the source file and running home-manager switch. This seems like unnecessarily many steps, especially after I saw this brilliant Hacker news comment, which after a week of use is a much better solution for this problem.

All in all home manager is nice software. I can see it being useful for people who either don’t control the system they run on but want to use Nix in user mode to run their corner of it, or for those Nix users who gamers (a well-adjusted group of humans if there ever was one) would call "filthy casuals", that is, people who just want things to work and don’t care very much about learning how to write enough Nix to make that happen.

I’m not included in those groups, as I run this system and explicitly want to learn to use Nix in anger so I can try and fail to convince people to run it in production at work. Home manager is fine software and if it makes you happy, then please use it.

Use ad hoc structs for command-line flags in Go

The path of least resistance to commandline flag parsing in Go is to use the flag package from the standard library. A lot of times the result looks like this:

func main() {
  help := flags.Bool("help", false, "HALP")
  frobinate := flags.Int("frobinate", 0, "Amount to frobinate by")
  blargalarg := flags.String("blargalarg", "", "Social media comment")
  // [713 variables later]
  flag.Parse()

  // Much later
  if *blargalarg != "" {
    // Do things. We may or may not remember what this variable is.
  }
}

That is, we have a bunch of variables lying around we don’t really care about and take up perfectly good names. If we see one of them later in the program, we don’t have any context on where it comes from, so we have to start jumping around in the source.

In my projects I’ve used a little accounting trick to hold these flags. I find it helps me deal with them. We just define an anonymous struct to hold the flags:

func main() {
  flags := struct{
    help *bool
    frobinate *int
    blargalarg *string
    // [713 field definitions]
  }{
    help: flags.Bool("help", false, "HALP"),
    frobinate: flags.Int("frobinate", 0, "Amount to frobinate by"),
    blargalarg: flags.String("blargalarg", "", "Social media comment"),
    // [713 field instantiations]
  }
  flag.Parse()

  // Much later
  if *flags.blargalarg != "" {
    // AAAAAH YES IT'S A FLAG
  }
}

If I really need to, I can pull the anonymous struct out into its own global variable or type definition or whatever, and pass it around as arguments to functions that deal with its contents. That is not as handy with a litter of flag variables. But really I just find that defining all these flags clearly in one place makes the program easier to read later once I’ve forgotten what it does.