In this lecture, we're going to study the nuts and bolts of ray
tracing. What are the steps you need to actually write a ray tracer as
in your homework assignment? We start with discussing camera ray
casting, and in subsequent segments, we'll talk about ray-object
intersections, ray tracing transformed objects, lighting calculations,
and recursive ray tracing.
Notice the outline in the code that I've shown here. What we're
interested in, in this lecture is this part. How do I shoot a ray so
that it goes to a given pixel, (i,j) ,in the image plane. Let's go
back to our diagrams of ray casting. Here we have the virtual
viewpoint to the camera. It shoots a ray to a pixel, misses all the
objects, it's colored black. The ray through the next pixel hits an
object and you shade it using the colors, the lights, the material
properties.
Then, you shoot a ray that goes through multiple intersections, and as
in OpenGL, you pick the closer intersection and you shade that point.
What this segment is about is choosing the ray direction for each
pixel. So, the goal is to find the direction given a pixel, i and
j. There are, of course, multiple ways in which you can approach this
problem. You can consider the objects to be in world coordinates and
then you find the direction of each ray.
In fact, this is what we do. Alternatively, you can transform the
objects into the camera's coordinate frame as in OpenGL, essentially
applying the gluLookAt transform. Then, the camera is in a canonical
frame and it's clear what the rays do. However, we'll consider the
camera at some location in space and we'll consider everything in
world coordinates. A ray, as you know, has an origin which is the
camera center, and the direction.
The goal is really to find what this direction should be, given the
camera parameters and the pixel locations i and j. The camera
parameters are as in gluLookAt, so they are in the Lookfrom, which is
the camera location, the LookAt, which is where is the camera looking
at, the up direction and the field of view.
So, once again, the parameterization is very similar to gluLookAt, and
the diagram is exactly the same as the early lecture on deriving
gluLookAt. You have an eye here, you are looking at some object here
and this is the up direction. So, the steps we take are very similar
to what we did in the derivation of gluLookAt if you remember.
First, we construct a coordinate frame.
The slide here comes from a very early lecture in this course where
given two vectors a and b that may not be normalized and may not be
parallel to each other, you want to create this u, v, and w coordinate
frame. First, you associate w with a, but you normalize it. Then there
is this neat trick to get the u vector, which is the cross product of
b and w. And that ensures that the components b along w are not
considered, only the orthogonal component is considered. And then you
say v is equal to w cross u.
Now, how do you determine a and b? Well, the vector a is simply given
by (eye - center), at least following the OpenGL convention where the
camera looks down the -z axis and it's at the origin. The vector b is
simply the up vector. So, given eye and center, we can find the vector
a, find the vector b and then, apply this coordinate frame to find u,
b, and w from a and b.
So, let's now talk about the canonical viewing geometry. We have the
vectors u, v, and w but we haven't said how to find the direction for
a given camera pixel i and j.
First, let's consider where the virtual image plane is. It is located
for now, one unit away from the camera.
Remember that the camera in OpenGL is at the origin looking down the
-z axis. In our case, we are considering world coordinates, so we are
still in world coordinates and the camera looks down the -w axis. For
that reason, the world, or rather the image plane through which the
world will be observed, is at -w, one unit in the -w direction.
Now, consider what happens to the ray through a given pixel, and we
have already seen the w coordinate is just -1 along the w axis. But
this ray also moves a certain direction alpha along the u vector which
corresponds to horizontal pixels, and beta along the v vector which
corresponds to vertical pixels.
So, all that remains is to derive what alpha and beta are. Let's first
consider the derivation of alpha. So, this is the width and we'll call
it j.
Furthermore, we label width as from going from zero to a final width,
let's call it w. And therefore, we really have to consider the
positive values of j to be centered around w / 2.
Therefore, what we're interested in is this value j - (w / 2), and
normalize the whole thing with w / 2.
That's the normalized amount you're moving along the X direction, it
goes between -1 and +1 along the image.
Okay. So, what happens if this value is equal to plus 1? Well, so if
this value is equal to plus 1, then you've gone the full extent
here. And therefore, the angle will be corresponding to the field of
view in the x direction.
So, what we really want to consider is from the way the triangles are,
you consider the triangle from here, and this will correspond to the
tangent of the field of view in the x direction. But because we're
considering half the field of view, so if you consider the full field
of view, this whole thing will be the field of view.
But we are considering half the field of view. And since it's one unit
here, the remaining dimension corresponds to the tangent. So, this
whole thing will be multiplied by the tangent of the field of view in
the x direction.
And indeed that is what I've shown on this slide, that the alpha
coordinate is tan(fovx/2) * (j - (width / 2)) / (width / 2).
Beta for the y direction is exactly the same. It's tan(fovy / 2)
times, and now there's just a sign change. So, you say,
(height / 2) - i instead of i - (height / 2). And there's a very
simple reason for that. If you consider the image, we typically have
(0,0) starting in the upper left. So, while j goes from left to right,
as you would expect, i actually goes from top to bottom. So, value of
i=0 is really +1. And so, if i=0, this will actually be +1, correctly.
Whereas, the value of i=height here, Will be
-(height / 2) / (height / 2), which will be -1, which is
what's desired. And so, simply because of the standard pixel naming
conventions, we have to invert what happens along the y direction or
along the i pixel. Finally, given alpha and beta, the direction is
easy.
So, you just have (alpha * u) + (beta * v) - w, and now, you have to
normalize it so you have to divide by the norm, and this is what the
final ray looks like. It starts at the eye and so, this is the origin.
And, this value here is the normalized direction. So, let me write
this as direction. And, in fact, this is a vector and this eye is also
a vector, this is also a vector, times some distance along the
ray. So, this t corresponds to the distance along the ray. That is
what the origin and direction for the camera eye rays correspond to.