**Deriving the Vulkan Perspective Projection Matrix** Introduction --- When I first switched from OpenGL to Vulkan, I used the common hack of using the negative viewport and offset to account for Vulkan's Y down. When I went to add in reverse Z functionality to my hobby renderer, it started getting messy, so I just decided to re-derive everything from scratch. Hopefully you'll find this useful. Perspective Projection Matrix --- Let's start with some notation. $(X_e, Y_e, Z_z)$ will be eye coordinates, $(X_c, Y_c, Z_c)$ will be clip coordinates, and $(X_p, Y_p, Z_p)$ will be projected coordinates (coordinates on the near plane). We want to work out the 4x4 matrix that takes points in eye space to points in clip space: $$ \newcommand{\tallest}{\left(-\frac{1}{R_1} - \frac{1}{R_2} \right)} \begin{bmatrix} \cdot & \cdot & \cdot & \cdot \\ \cdot & \cdot & \cdot & \cdot \\ \cdot & \cdot & \cdot & \cdot \\ \cdot & \cdot & \cdot & \cdot \end{bmatrix} \begin{bmatrix} X_e \\ Y_e \\ Z_e \\ 1 \end{bmatrix} = \begin{bmatrix} X_c \\ Y_c \\ Z_c \\ W_c \end{bmatrix} $$ The Intercept Theorem tells us that the ratio between a coordinate in projected space and eye space is the same for each coordinate: $$ \frac{Y_p}{Y_e} = \frac{Z_p}{Z_e} $$ But we know that $Z_p$ is the negated near plane, since $n$ is the near plane *distance*, and we're looking down the negative $Z$ axis. Therefore $$ Y_p = \frac{-n}{Z_e} Y_e = \frac{1}{-Z_e} n Y_e $$ and likewise for $X$: $$ X_p = \frac{-n}{Z_e} X_e = \frac{1}{-Z_e} n X_e $$ The perspective divide that happens in hardware can be used to work around the fact that when multiplying a vector by a matrix, there's no way to divide by a component. The hardware will divide $X_c, Y_c, Z_c$, and $W_c$ by $W_c$. Ok, so we know then what the clip space w component $W_c$ should be then, it's simply $-Z_e$. Why? Well note that dividing by $-Z_e$ is exactly what we want to achieve to get from eye coordinates to projected coordinates. So now we can fill out the final row to accomplish this divide: $$ \newcommand{\tallest}{\left(-\frac{1}{R_1} - \frac{1}{R_2} \right)} \begin{bmatrix} \cdot & \cdot & \cdot & \cdot \\ \cdot & \cdot & \cdot & \cdot \\ \cdot & \cdot & \cdot & \cdot \\ 0 & 0 & -1 & 0 \end{bmatrix} \begin{bmatrix} X_e \\ Y_e \\ Z_e \\ 1 \end{bmatrix} = \begin{bmatrix} X_c \\ Y_c \\ Z_c \\ W_c \end{bmatrix} $$ Normalized Device Coordinates (NDC) are clip space coordinates divided by their $w$ component. This is why they're called "normalized" (similar to normalizing a vector by dividing all the coordinates by length). $$ \newcommand{\tallest}{\left(-\frac{1}{R_1} - \frac{1}{R_2} \right)} \begin{bmatrix} X_n \\ Y_n \\ Z_n \\ W_n \end{bmatrix} = \begin{bmatrix} X_c/W_c \\ Y_c/W_c \\ Z_c/W_c \\ 1 \end{bmatrix} $$ The near plane of the viewing frustum is defined by the 4 points (starting from upper left and going clockwise): $$ (l, t), (r, t), (r, b), (l, b) $$ and since Vulkan is Y down, we want an NDC mapping like so: $$ (l, t) => (-1, -1) $$ $$ (r, t) => (1, -1) $$ $$ (r, b) => (1, 1) $$ $$ (l, b) => (-1, 1) $$ Setting up a linear system for this and solving, we get $$ X_n = \frac{2}{r-l} X_p - \frac{r + l}{r - l} $$ $$ Y_n = \frac{2}{b-t} Y_p - \frac{b + t}{b - t} $$ Substituting formulas for $X_p$ and $Y_p$, we have $$ X_n = \frac{2}{r-l} \left( \frac{n Y_e}{-Z_e} \right) - \frac{r + l}{r - l} $$ and factoring out $1/-Z_e$, we get $$ X_n = \frac{1}{-Z_e} \left( \frac{2n}{r-l} \right) X_e + \frac{r + l}{r - l} Z_e $$ and same for $Y_n$ $$ Y_n = \frac{1}{-Z_e} \left( \frac{2n}{b-t} \right) Y_e + \frac{b + t}{b - t} Z_e $$ But recall $X_c = X_n W_c$ and $Y_c = Y_n W_c$, so now we're able to fill out the first 2 rows of our eye-to-clip matrix: $$ \newcommand{\tallest}{\frac{2n}{r-l}} \begin{bmatrix} \frac{2n}{r-l} & 0 & \frac{r+l}{r-l} & 0 \\ 0 & \frac{2n}{b-t} & \frac{b+t}{b-t} & 0 \\ \cdot & \cdot & \cdot & \cdot \\ 0 & 0 & -1 & 0 \end{bmatrix} \begin{bmatrix} \vphantom { \tallest } X_e \\ \vphantom { \tallest } Y_e \\ Z_e \\ 1 \end{bmatrix} = \begin{bmatrix} \vphantom { \tallest } X_c \\ \vphantom { \tallest } Y_c \\ Z_c \\ W_c \end{bmatrix} $$ Ok, next, on to z. We know we want to encode z to go from 1 to 0 since we're doing reverse z. Hopefully it's clear we don't really need to use x and y values, so our third row will look something like $(0, 0, A, B)$. We just need to figure out $A$ and $B$. We know $$ Z_n = \frac{Z_c}{W_c} = \frac{A Z_e + B W_e}{-Z_e} $$ But eye space always has a w component of 1, so this simplifies to $$ Z_n = \frac{A Z_e + B}{-Z_e} $$ So when $Z_n = 1$ we know that $Z_e$ must have been $-n$ (recall $n$ is positive). Basically, when eye z is on the near plane, $Z_n = 1$. Likewise, when $Z_n = 0$, $Z_e$ must be -far, so $$ \frac{A(-n)+B}{-(-n)} = 1 $$ $$ \frac{A(-f)+B}{-(-f)} = 0 $$ so therefore $$ A(-n) + B = n $$ $$ A(-f) + B = 0 $$ and we have $$ A = \frac{n}{f-n} $$ $$ B = \frac{nf}{f-n} $$ Finally, we have $$ Z_n = \frac{1}{-Z_e} \left( \frac{n Z_e}{f-n} + \frac{nf}{f-n} \right) $$ As before, we can see that the expression in the parentheses is equal to $Z_c$, so with that, we can finally build the complete matrix $$ \newcommand{\tallest}{\frac{2n}{r-l}} \begin{bmatrix} \frac{2n}{r-l} & 0 & \frac{r+l}{r-l} & 0 \\ 0 & \frac{2n}{b-t} & \frac{b+t}{b-t} & 0 \\ 0 & 0 & \frac{n}{f-n} & \frac{nf}{f-n} \\ 0 & 0 & -1 & 0 \end{bmatrix} \begin{bmatrix} \vphantom { \tallest } X_e \\ \vphantom { \tallest } Y_e \\ Z_e \\ 1 \end{bmatrix} = \begin{bmatrix} \vphantom { \tallest } X_c \\ \vphantom { \tallest } Y_c \\ Z_c \\ W_c \end{bmatrix} $$