Steinar H. Gunderson

Thu, 14 Mar 2013 - Introduction to gamma

Stuck in a suburb of Auckland for the night, mostly due to Air New Zealand. *sigh* Well, OK, maybe I can at least write that blog entry I've been meaning to for a while...

When I wrote about color a month ago, my post included, in a small parenthesis, the following: “Let me ignore the distinction between Y and Y' for now.” Such a small sentence, and so much it hides :-) Let's take a look.

First, let's remember that Y measures the overall brightness, or luminance. Let's ignore the fact that there are multiple frequencies in play (again, sidestepping “what is white?”), and let's just think of them as a bunch of equal photons. If so, there's a very natural way to measure the luminance of a pixel; conceptually, just look the amount of photons emitted per second, and normalize for some value.

However, this is not usually the way we choose to store these values. First of all, note that there's typically not infinite precision when storing pixel data; although we could probably allow ourselves to store full floating-point these days (and we sometimes do, although it's not very common), back in the day, where all of these conventions were effectively decided, we certainly could not. You had a fixed number of bits to represent the different gray tones, and even today's eight bits (giving 256 distinct levels, bordering on the limits of what the human eye can distinguish) was a far-fetched luxury.

So, can we quantize linearly to 256 levels and just be done with it? The answer is no, and there are two good reasons why not. The first has to do, as so many things, with how our visual system works. Let's take a look at a chart that I shamelessly stole from Anti-Grain Geometry:

To quote AGG: “On the right there are two pixels and we can credibly say that they emit two times more photons pre (sic) second than the pixel on the left.” Yet, it doesn't really appear twice as bright! (What does “twice as bright” really mean, by the way? I don't know, but there's some sort intuitive notion of it. In any case, we could rephrase the question in terms of being capable of distinguishing between different levels, but it just complicates things.)

So, the eye's response to luminance is not linear, but more like the square root (actually, more like the exponent of 1/2.2 or 1/2.4). Thus, if we want to quantize luminance into N (for instance 256) distinct levels, we'd better not space them out linearly; let's instead do x^(1/2.2) (or something similar) and then quantize linearly. This is equivalent to a non-uniform quantizer; we say that we have encoded the signal with gamma 2.2. (In reality, we don't use exactly this, but it's close, and the reasons are more of electrical than perceptual nature.) Also, to distinguish this gamma-compressed representation of the luminance from the actual (linear) luminance Y, we now add a little prime to the symbol, and say that Y' is the luma.

The other reason is a very interesting coincidence. A CRT monitor takes in an input voltage and outputs (through some electronics controlling an electron gun, lighting up phosphor) luminance. However, the output luminance is not linearly dependent with the input voltage; it's more like the square! (This has nothing to do with the phospor, by the way; it's the electrical circuits behind it. It's partially by coincidence and partially by engineering.) In other words, a CRT doesn't even need to undo the gamma-compressed quantization, it can just take the linear signal in and push it through the circuit, and get the intended luminance back out.

Of course, LCDs don't work that way anymore, but by the time they became commonplace, the convention was already firmly in place, and again, the perceptual reasons still apply.

Now, what does this mean for pixel processing, and Movit in particular? Noting that many of the filters we typically apply to our videos (say, blur) are physical processes that work on light, and that light behaves quite linearly, it's quite obvious that we want to process luminance, not some arbitrarily-compressed version of it. But this is not what most software does. Most software just take the gamma-encoded RGB values (you encode the three channels separately) and do mathematics on them as if they were representing linear values, which ends up being subtly wrong in some cases and massively wrong in others. There's an article by Eric Brasseur that has tons of detail about this if you care, but in general, I can say that correct processing is more the exception than the norm.

So, what does Movit do? The answer is quite obvious: Convert to linear values on the input side (by applying the right gamma curve; something like x^2.2 for each color channel), do the processing, and then compress back again afterwards. (Movit works in 16-bit and 32-bit floating point internally, by virtue of it being supported and fast in modern GPUs, so we don't have problems with quantization that you'd get in 8-bit fixed point.) Actually, it's a bit more complex than that, since some filters don't really care (e.g., if you just want to flip an image vertically, who cares about gamma), but the general rule is:

If you want to do more with pixels than moving them around (especially combining two or more, or doing arithmetic on them), you want to work in linear gamma.

There, I said it. And now to try to get dinner before getting up at 5am tomorrow (which is 2am on my internal clock, since I just arrived from Tokyo). Gah.

[07:01] | | Introduction to gamma

Steinar H. Gunderson <sgunderson@bigfoot.com>