Affine Transforms

2023/09/28

Categories: GeekStuff

Affine Transforms

(Long time no blog! My blog rendering thing broke, my templates were incompatible with the new one, and I needed to figure out how to use MathJax to try to make this legible. Ugh.)

So, I like fractals, especially iterated function system fractals (think the Koch snowflake). And I’ve been writing code to do these for years. And it’s been horribly, stupidly, slow that whole time because I didn’t understand affine transforms. I’m only interested in 2D, so I’m talking about 2D.

So here’s what I didn’t understand. Let’s start with the basic concept. An affine transform is a transform which preserves the relationships between points, and mostly this means we can do scales, rotations, and translations. (Translate, here, in the sense of “move by a fixed amount.”) The way this is represented is linear algebra and matrix multiplication. We define a matrix in the form

$\begin{bmatrix}a00 & a01 & a02 \\ a10 & a11 & a12 \\ 0 & 0 & 1\end{bmatrix}$

and we extend our vector from $\begin{bmatrix}x & y\end{bmatrix}$ to $\begin{bmatrix}x & y & 1\end{bmatrix}$. We can then multiply these out, ignore the last value in the result, and get our new $x’$ and $y’$.

But we don’t actually need to store the $\begin{bmatrix}0 & 0 & 1\end{bmatrix}$ row, or the extra 1 at the end of our vector. We can just compute things as though they were there. So we actually store an affine transform as six values, like this:

$\begin{bmatrix}a00 & a01 & a02 \\ a10 & a11 & a12\end{bmatrix}$

Similarly, we don’t really have to extend our point, we can just figure out the computations. So, the actual computation looks like this (switching to Go instead of math):

type Point struct { X, Y float64 }
type Affine struct { a00, a01, a02, a10, a11, a12 float64 }

func (m *Affine) Project(p Point) Point {
	return Point{
		X: p.X * m.a00 + p.Y * m.a01 + m.a02,
		Y: p.X * m.a10 + p.Y * m.a11 + m.a12,
	}
}

Arguably, the last expression for X is really 1 * m.a02 but I am okay with simplifying that one without showing my work.

But how do you create these matrixes?

Well, that’s easy. First, you define the operations you want, then you multiply the matrixes together to get a matrix that combines the operation. This works because linear algebra is, to be highly technical, fucking magic. It turns out that the result of multiplying a vector by three matrixes in order is identical to the result of multiplying the matrixes together and then multiplying the vector by the result. So, if you need to do a rotation, a scale, and a translation, you can build matrixes for each of those, multiply them together, and get a result you can use.

Of course, once you know you’re going to do that, you can simplify a bit along the way, but let’s build up to it.

The Three Important Operations

These are the operations that are important to me because I care about them and need to use them. There’s others. I’m ignoring shear, for instance, because we don’t need it here.

Translate

Translate is really easy. Here’s the translate matrix that results in everything being offset by $\begin{bmatrix}x0 & y0\end{bmatrix}$:

$\begin{bmatrix}1 & 0 & x0 \\ 0 & 1 & y0\end{bmatrix}$

If you expand the operations above, you’ll see how this works:

return Point{
	X: p.X * 1 + p.Y * 0 + x0,
	Y: p.X * 0 + p.Y * 1 + y0,
}

which simplifies to {X: p.X + x0, Y: p.Y + y0}.

Scale

Scale is also easy and pretty straightforward.

$\begin{bmatrix}xs & 0 & 0 \\ 0 & ys & 0\end{bmatrix}$

Again, multiplying it out makes it pretty obvious how this works:

return Point{
	X: p.X * xs + p.Y * 0 + 0,
	Y: p.X * 0 + p.Y * ys + 0,
}

which simplifies to {X: p.X * xs, Y: p.Y * ys}.

Rotate

The rotation matrix looks sort of arbitrary at first, although it at least basically makes sense that it’s full of trig functions because rotation is pretty much just circles, and trig functions are what circles do when they’re angry:

$\begin{bmatrix}\cos{\theta} & -\sin{\theta} & 0 \\ \sin{\theta} & \cos{\theta} & 0\end{bmatrix}$

I’ve never seen anyone explain why this works, but we’ll get to an explanation in a bit. It’s sufficient to note that it does work, and allows you to rotate things in a predictable fashion.

Combining these

Okay, so. Let’s say you’re doing a fractal renderer. You have a line segment; it goes from $(x0, y0)$ to $(x1, y1)$. You want a matrix that will map $(0, 0)$ to $(x0, y0)$, and $(1, 0)$ to $(x1, y1)$, and everything else in a corresponding way. How do you do this? Your line segment may be a different length than the unit vector, so you need to compute the distance between your points, and have a scaling matrix. Your line segment may be rotated from the unit vector, so you need to compute its angle and have a rotation matrix. And, of course, you need to offset everything. (We do this last because otherwise the others can mess with it.) So, here’s our three matrixes:

Rotation:

$\begin{bmatrix}\cos{\theta} & -\sin{\theta} & 0 \\ \sin{\theta} & \cos{\theta} & 0\end{bmatrix}$

Scale:

$\begin{bmatrix}l & 0 & 0 \\ 0 & l & 0\end{bmatrix}$

Translate:

$\begin{bmatrix}1 & 0 & x0 \\ 0 & 1 & y0\end{bmatrix}$

But we need to fill in some values. It’s pretty easy to solve the translate matrix; we want the origin to translate to $(x0,y0)$, so that’s what we translate by. For the scale matrix, it’s obvious that we need the same scale for both X and Y, but what is it? Well, it’s the length of the target segment. You know the formula for length, probably, so that’s easy. (We’ll use dx and dy as shorthand for x1-x0 and y1-y0 respectively.)(

$l = \sqrt{dx^2 + dy^2}$

Rotation isn’t too bad, we know what a rotation looks like, but what’s $\theta$? It’s “the angle of the line”. So, imagine we started at the origin and drew a line to $(dx,dy)$. What would be the angle between that line and the X axis? Well, conveniently, there’s a standard math library function for this in basically math library. It’s called “arc tangent of two variables”, or Atan2. It’s usually invoked as Atan2(y, x), not the other way around. (There’s a reason for this, which I’m going to come back to.)

So… We can compute $\theta$:

$\theta = \operatorname{atan2}(dy, dx)$

Here’s some nice simple Go code to compute the affine matrix between P0 and P1:

// repeating these definitions so you don't have to go look
type Point struct { X, Y float64 }
type Affine struct { a00, a01, a02, a10, a11, a12 float64 }

func AffineForDelta(p0, p1 Point) (a Affine) {
	dx := p1.X - p0.X
	dy := p1.Y - p0.Y
	l := Math.Sqrt(dx * dx + dy * dy)
	theta := Math.Atan2(dy, dx)
	sin, cos := Math.Sincos(theta)
	return Affine{
	  a00: cos * l, a01: -sin * l, a02: p0.X,
	  a10: sin * l, a11:  cos * l, a12: p0.Y,
	}
}

I’ve had roughly this code, in one form or another, following me around across C, Java, Lua, Go, and C# code bases, because this is the obvious, simple, way to compute this affine transform.

We can do better

So, if you look closely at this code, you’ll notice a thing (mad props to the friend who pointed this out as an obvious opportunity to remove trig functions): We’re computing the sin and cosine of an angle that we just computed by calling arctangent. This actually just normalizes the values we started with, to satisfy $x^2 + y^2 = 1$. It doesn’t tell us anything else. And you know how else you can normalize a vector? Find out its length, and divide it by that. (This may seem surprising, so let’s try it out. $(3,4)$ has the length $\sqrt(3^2+4^2)$, or 5. So we divide the vector by 5, getting $(3/5,4/5)$. The length of that is $\sqrt((3/5)^2+(4/5)^2)$, which is $\sqrt(9/25+16/25)$, which is $\sqrt(25/25)$ and I think we can see that this will get us to 1. Neat, huh?)

So, here’s an improvement we can make:

func AffineForDelta(p0, p1 Point) (a Affine) {
	dx := p1.X - p0.X
	dy := p1.Y - p0.Y
	l := Math.Sqrt(dx * dx + dy * dy)
	sin, cos := dy / l, dx / l
	return Affine{
	  a00: cos * l, a01: -sin * l, a02: p0.X,
	  a10: sin * l, a11:  cos * l, a12: p0.Y,
	}
}

Look, we got rid of two trig functions! And here you see why Atan2 takes $y$ first and $x$ second. Atan2 is the logical inverse of Sincos. Sincos gives you sine and cosine of an angle, Atan2 gives you angle of a sine and cosine. For a point on the unit circle, $y$ is $\sin\theta$, and $x$ is $\cos\theta$. So, since we usually order sine before cosine, Atan2 takes $y$ before $x$.

But you know what, there’s another thing we can do. Let’s expand terms a bit. If you have cos = dx/l, and you have a.a00 = cos * l, that comes out to a.a00 = dx / l * l. Or, simplifying terms a bit, a.a00 = dx.

Wait a second.

The operations we’re doing to compute the rotation matrix, and the operations we’re doing to compute the scale matrix, are in fact canceling each other out. We are using the length to do two things: First, to divide something by it, and second, to multiply something by it. We don’t need the length at all!

func AffineForDelta(p0, p1 Point) (a Affine) {
	dx := p1.X - p0.X
	dy := p1.Y - p0.Y
	return Affine{
		a00: dx, a01: -dy, a02: p0.X,
		a10: dy, a11:  dx, a12: p0.Y,
	}
}

So that seems… Simpler. So why does it work? What is actually happening in this matrix?

Thinking about how the affine matrix works

So let’s look again at what happens when we project by a matrix:

return Point{
	X: p.X * m.a00 + p.Y * m.a01 + m.a02,
	Y: p.X * m.a10 + p.Y * m.a11 + m.a12,
}

The members a00 and a10 represent the effect of $x$ on $x’$ and $y’$ respectively; the members a01 and a11 represent the effect of $y$ on $x’$ and $y’$. You can also think of the columns of the affine matrix (not including the $\begin{bmatrix}0 & 0 & 1\end{bmatrix}$) as vectors in the output space. The first represents “what should I add to my output per 1 unit of $x$ in the input”, the second “what should I add to my output per 1 unit of $y$ in the input”, and the third “what should I add to my output unconditionally”? Another way of looking at it is that a00 and a11 represent the effect that the X and Y coordinates have on themselves, while a01 and a10 represent the effect that X and Y coordinates have on each other. (And then a02 and a12 are “what effect is had on $x’$ and $y’$ by nothing-in-particular”.)

And now we have a way to talk about what the rotation matrix means, and why $\sin$ and $\cos$ are used to make it. Think about rotating a plane a little bit counterclockwise. As you rotate it, right becomes up, up becomes left, left becomes down, and down becomes right. So, the further right something used to be, the more up it is now; the further up something used to be, the more left it is now. That means that $y’$ is increasing with $x$, and $x’$ is decreasing with $y$. And if you’ve only rotated a little bit, $x’$ is also still increasing with $x$, and $y’$ is still increasing with $y$. And the more you rotate, the higher the impact of $x$ on $y’$, and the lower the impact of $x$ on $x’$, until you get to a full quarter-turn rotation, and then it starts going the other way. So we definitely want the terms to have signs such that the diagonal starts out positive, and the contribution of $y$ to $x’$ starts out negative, while the contribution of $x$ to $y$ starts out positive. So that hints at the signs we want, but how do we decide on the magnitudes? How do we pick two magnitudes, one for how much each value affects itself, and one for how much each value affects the other, such that the overall outcome preserves length?

The answer here is we want to know the determinant of the matrix, which turns out to be roughly equivalent to “how much is this scaled by”. (It’s really how much area is scaled by, but for the case where that answer is 1, that implies also that length is being scaled by 1.) Consider a 2x2 matrix, in the generic form:

$\begin{bmatrix}a & b \\ c & d\end{bmatrix}$

The determinant of this matrix is $ad - bc$. For an operation that isn’t supposed to change lengths or areas, we want that to consistently come out to 1. As it happens, $b$ and $c$ are both $\sin\theta$, but one is negative, so $bc$ is $-\sin^2\theta$, while $ad$ is $\cos^2\theta$. And subtracting negatives gets you positives, so $ad - bc$ is $\cos^2\theta + \sin^2\theta$. Which is… 1. Always. Sine and cosine represent the relation between $x$ and $y$ that holds true throughout a unit circle, which makes them a good candidate for things to use to apply to a point and get another point on the same circle around the origin.

So in our little partial matrix:

$\begin{bmatrix}\cos\theta & -\sin\theta \\ \sin\theta & \cos\theta\end{bmatrix}$

The two $\cos\theta$ terms have the same sign because the contribution of $x$ to $x’$, and $y$ to $y’$, are pointed in the same direction; they represent how similar the result is to the original state. The $\sin\theta$ terms, by contrast, have to have opposite signs (but the same magnitude). The more $x$ affects $y’$, the more $y$ affects $x’$, but if it’s not in the opposite direction, our “rotation” will be changing lengths.

It all works out very neatly!

This is why “scale” shows up as a00 and a11 in a simple scale matrix. And, if you multiply it out, that means that if you multiply another matrix by a matrix that has the same scale in those two positions, this multiplies all four of these values by that scale. (a00 and a10 are multiplied by the scaling matrix’s a00, while a01 and a11 are multiplied by the scaling matrix’s a11. If a00 and a11 in the scaling matrix are the same, it doesn’t matter.)

So how DOES that affine matrix work?

So, when we multiply everything out, we end up with this surprisingly simple matrix:

$\begin{bmatrix}dx & -dy & x0 \\ dy & dx & y0\end{bmatrix}$

You can compute $\sin\theta$ and $\cos\theta$ for the angle denoted by $(dx, dy)$ by taking dy (and dx) divided by length, which gives you a rotation matrix. You can multiply an affine matrix by a scale along the diagonal, and get a scaled rotation matrix. Since the length operations cancel out, we end up with this nice simple form. The translation happens almost as an afterthought. You’ll also perhaps notice that neither rotation nor scaling affects the translation component of the matrix. So the translation just ignores those. It doesn’t matter what the rotation or size of your line segment is; when you scale and rotate $(0, 0)$, you just always get $(0, 0)$, because you’re multiplying things by 0. We then add the translation components and end up at the start of our line segment.

So here’s some examples of how that works in practice:

If your target segment is a horizontal line, you’ll end up with just the simple scaling+translation matrix, multiplying everything (both $x$ and $y$) by the length of that line, which will just be dx.

If your target segment is a vertical line, the $\cos\theta$ terms turn into zeros, while the $\sin\theta$ terms turn into ones, which multiply by the length, but since the line segment is vertical, the length is just dy. So this is just computing a vector normal to the original vector, with length dy. That gives us $x’ = -y * l$ and $y = x * l$.

If your target segment is a horizontal line backwards ($x1 < x0$), the $\cos\theta$ terms are negative, so, everything is flipped; $x’ = -x * l$, and $y’ = -y * l$. Which… makes sense. That’s a 180-degree, or pi radian, rotation.

And between those, you get hybrids where you get progressively more, or less, of each of those components, in ways that happen to always turn out to be keeping the resulting points on a circle of the same length as the vector $(dx,dy)$, around $(x0,y0)$.

It’s all very pretty.