Today's topics are a preparation for Manuel Menezes de Oliveira Neto's course/talks next week.

The GPU graphics pipeline

Complying with Microsoft DirectX 10, the current generation of graphics cards provides a large number of identical computational units. Each unit (unified shader) is dynamically configured to serve in one of three different ways, according to Shader Model 4.0:

Vertex Shader. Receives one vertex and may change its attributes. Typical task: Transform from object coordinates to normalized screen coordinates.
Geometry Shader. Optional. Receives the three vertices of a triangle from the vertex shader and may change these or produce different geometry (triangle strips) such as “fins”. May also access the triangles' three next neighbor vertices, if they indices are provided in the index buffer. There are options for lines and points as well. The output may be used to create pixels for the pixel shader or may be stored in memory (stream buffer).
Pixel Shader. Receives a single pixel and may change color and/or transparency. At some cost, it may also change the depth or discard the pixel. Can also be used to write data into up to eight textures (MRT, multiple render target).

Data compression by choosing the right basis

In linear algebra, a set B of vectors is called a basis of a linear space if every vector of the space can be formed from the set B (“B spans the linear space.”) and if no vector of the set B can be formed from its other elements (“B is linearly independent”). As it turns out, all bases for a given linear space have the same number of elements. This number is called the dimension of the vector space.

Some bases are more appropriate to do computations in than others. For instance, we may describe a point in 3D through x = x e_x + y e_y + z e_z, where all of e_x, e_y, e_z have length 1 and are perpendicular to each other. This is called an orthonormal basis. It makes finding the components x, y, z easy: They are the scalar products of x and the corresponding basis vector.

Precisely the same ideas can be applied to spaces with more dimensions. Particularly relevant in practice is the space of grayscale images of 8x8 pixels. This space has 64 dimensions. (What is the best basis to see this?) JPEG, MPEG-1 Video, MPEG-2 Video, MPEG-4 Simple Profile (but not MPEG-4 AVC = H.264) break the grayscale part every image down into such blocks and represent every of these blocks with a different basis, a collection of 64 more or less complex wave patterns. Going from the obvious basis to this basis is like using a tilted coordinate frame in 3D: It's just a rotation. The transformation that does the same thing for the 8x8 patterns is the 2D Discrete Cosine Transform (DCT). It behaves like a rotation in 64 dimensions. It fetches the grayscale values of the 64 pixels and outputs the 64 components (“coefficients”), that is, the positive or negative or zero contribution of every pattern to the image. To go back from these contributions to the original image, you can sum them, which is the inverse 2D Discrete Cosine Transform, which can be likened to a reverse rotation.

Simply going from 64 pixels to 64 components would not save any memory space. The trick is to notice that basis functions are adapted to the typical content and to our perception: The complex ones do not occur in large amounts; and if the amount is not perfectly right, we don't notice. Thus, one can store the contributions of the complex patterns at lower precision; most of them may be set to zero. This hopefully unnoticeable loss in precision helps to dramatically cut down the required memory.

Finding the right basis is the key!

Breaking the image into small blocks leads to ugly artifacts, so why not construct a basis for all of it? The problem is to do this in a way so that a) it helps to compress and b) the transformation can be computed quickly. A long time wavelets (as in JPEG2000) were thought to be the answer to this question. Here, the basis consists of translated (in discrete steps) and scaled (mostly in powers of two) copies of a mother wavelet. The contributions (“coefficients”) can be determined by filtering. As it turned out, however, using the full image made motion compensation, the vital compression technique for video, very hard. MPEG-4 AVC alias H.264 uses blocks of 8x8 or even 4x4 pixels with a very simple rotation-like transformation, the Hadamard transform.

One can also deduce the basis to use from a collection of typical data. Eigenfaces are found by looking at the most prominent modes of variation in pictures of faces. To faithfully encode a face's image, you only need to store the contributions (“coefficients”) of the most prominent modes.

The basis that's used most widely comprises sinusoidal functions. One may for instance look at the linear space formed by all functions on the real line that have a period of 2p (that is: f(x) = f(x+1) for all x). Fourier's basis for this space is 1, sin(x), cos(x), sin(2x), cos(2x), sin(3x), etc. (which is: DC, fundamental sine, fundamental cosine, first overtone sine, etc.). With complex numbers, this gets nicer: all functions exp(ikx) for k = ..., -2, -1, 0, 1, 2, ... In this space, the scalar product (now of two functions) is the integral from 0 to 2p of the functions' product. (For technical reasons, the first function in the product is turned into its complex conjugate.)

Sine-like bases can also be constructed for other linear spaces. These bases consist of the eigenfunctions of an appropriately defined Laplace operator. This more or less measures the curvature. In non-curved spaces it is equal to the second derivative. It may also be defined on polygonal objects and graphs (demo: graph layout) or on regions of the plane, such as a drumhead, in which case the eigenvalues (the mathematical “spectrum”) form the acoustic spectrum. One a curved surface, the eigenfunctions of the Laplace operator form patterns that may be used for quadrangulation.

In lighting computations, one often deals with directions, which can be represented as points on a sphere. To compress functions that are defined on the sphere, it is helpful to look at the eigenfunctions of the Laplace operator on the sphere. There are the “spherical hamonics.” Their role for compressing material characteristics or illumination data is the same as the role of the DCT patterns for JPEG picture compression.