Computing Linear Perspective Projection

Coordinate system of Direct3D: x points to the right, y points upward, z points into the screen; the origin is at the center of the screen. This is a left-handed system.

Situation to derive the projection equations: viewer at origin, image plane perpendicular to z axis at distance d.
To which point on the image plane is the point (x,y,z) projected?
Intercept theorem [Strahlensatz]: It is projected to (xd/z, yd/z,d).

Obviously, perspective projection requires a division by z. This cannot be implemented using matrices as usual (i.e., as a linear mapping or an affine mapping).

Homogeneous Coordinates

Using vectors and matrices nearly as usual would be nice. Mathematicians found a clever workaround to implement perspective projection using matrices and vectors: homogeneous coordinates. (Note that (x, y, z)^T means to turn the row vector (x, y, z) into a column vector. This way of writing saves lots of space.)

Add a fourth dimension w to (x, y, z)^T. If (x, y, z)^T represents a position, convert it to (x, y, z, 1)^T. If (x, y, z)^T represents a (genuine) vector (i.e., an object that can freely be subjected to parallel translation), convert it to (x, y, z, 0)^T.
Subject those 4D vectors to 4x4 matrices. The result will be some (x', y', z', w')^T.
To convert a position (x', y', z', w')^T to regular 3D coordinates, divide by w', which yields (x'/w', y'/w', z'/w')^T.

A large set of transformations can be implemented in this way, including all rotations, scalings, perspective projections, translations (!), and their compositions. For instance, a translation can be achieved through the matrix
(1 0 0 a)
(0 1 0 b)
(0 0 1 c)
(0 0 0 1).
Note that a (genuine) vector will not be affected by this matrix because its fourth component is zero.

Transformation Matrices

Some technical preparations (to be explained later):
// create the z buffer
presentParams.AutoDepthStencilFormat = DepthFormat.D24X8;
presentParams.EnableAutoDepthStencil = true;
// generate a mesh (introduce Mesh theTeapot as a member variable and include a reference to Direct3DX)
theTeapot = new Mesh.Teapot(device)
// clear frame and depth buffer
device.Clear(ClearFlags.Target | ClearFlags.ZBuffer, Color.Blue, 1.0f, 0);
// show the mesh
device.RenderState.Lighting = false;
theTeapot.DrawSubset(0);

Typically, we use several coordinate frames [Koordinatensysteme] to describe a 3D world. Direct3D offers four of them:

Object space: adjusted to the current 3D object
World space: adjusted to the whole world
Camera space: The camera sits at the origin. We look along its z axis
Normalized screen space: x points to the right (left window boundary = -1, right = 1), y points upward (lower boundary = -1, upper = 1), z points inside the screen; the origin is at the center of the window.

We do not specify these frames directly, but specify matrices to let Direct3D do the conversions:

World matrix: object space -> world space, i.e., where to put the object
View matrix: world space -> camera space, i.e., where to put the camera
Projection matrix: camera space -> normalized screen space, i.e., which lense [Kameraobjektiv] to use

There are some pre-built matrices to do the job:
device.Transform.World = Matrix.RotationZ(0.6f)*Matrix.Translation(0.0f, 0.0f, 1.5f);
device.Transform.View = Matrix.LookAtLH(new Vector3(0.0f, 0.0f, -3.0f), new Vector3(), new Vector3(0.0f, 1.0f, 0.0f));
device.Transform.Projection = Matrix.PerspectiveFovLH((float)Math.PI/3.0f, ClientSize.Width/(float)ClientSize.Height, 0.1f, 50.0f);

We can multiply matrices using *, which is appropriately overloaded in Managed DirectX. (Operator overloading is a handy feature of C++ and C# in contrast to Java.) The order [Reihenfolge] is important most of the times: For matrices, A*B and B*A are different in general. Contrary to mathematical tradition (as above), DirectX employs row vectors, not column vectors. Thus, products are formed with the vector on the left hand side: v*A*B*C. This means that in the matrix product A*B*C first A is applied, then B, then C.

The basic functionality of OpenGL concerning matrices is very similar to that of Direct3D. Here are the main differences:

The World and the View matrix are combined into the MODELVIEW matrix in OpenGL.
During the first stages of the OpenGL graphics pipeline, z points out of the screen, thus forming a right-handed system. In later stages of the OpenGL graphics pipeline, however, z points inside the screen.
In OpenGL, vectors are treated as columns, so that we form products like A*B*C*v. Thus, in a matrix product A*B*C the matrix C is applied first, which is somewhat counterintuitive.