Final Project!

By Adam Chang

Overview

For the final project for CS194-26, we were tasked with choosing 2 of the suggested projects. The ones I chose were

  1. Augmented Reality
  2. Light Field Camera
I implemented a bells & whistle extension for the second project, Light Field Camera.

Augmented Reality

The first project, Augmented Reality, incorporated some of the ideas that we've learned throughout the class, especially pertaining to the pinhole camera model. I created a 3D object and labeled it with an array of points and then measured to determine their position in 3D space with respect to a coordinate axis. Next, I captured a video of this 3D object. I then manually selected these points in UV (image) space, and then used an optical flow tracker to extend these correspondences from the first frame to the entire video. Finally, using these 2D and 3D correspondences, I calculated the projection matrix from 3D to 2D space, which allowed to me to render a cube anywhere in 3D space with respect to the coordinate axis I chose earlier.

The first gif here illustrates the keypoints that I selected and then were propogated using optical flow tracking.

The keypoints, visualized

The second gif here illustrates the rendered cube.

The cube, rendered

We can see that the keypoint tracking begins to break down as the camera moves more. One way to improve this algorithm would probably be to use sparse keypoints and descriptors and use Perspective-n-Point to recover the frame-to-frame rotation and translation instead of optical flow on grid corners.

Light Field Camera

The second project, Light Field Camera, builds on the latter portion of the class relating to the plenoptic function. We learned about how a light field camera captures a greater amount of information about the actual rays of light in the scene. This enables interesting functionality such as refocusing in post. In this project, we work with an approximation of a light field camera by simply taking many pictures of the same scene with a regular camera in a grid. We find that by appropriately shifting these images, we can do things such as refocus on different portions of the scene, and also imitate a single camera with different apertures, leading to different field of view effects.

Refocusing

We find that by applying a shift of each image towards the center of the grid of a certain scale, we can cause the image to refocus to different depths. The intuition behind is that objects further from the cameras have less disparity between images, so smaller shifts lead to those objects aligning. Objects closer to the camera have more disparity so larger shifts are needed to align them. I performed the shifts by calculating the X and Y pixel offsets between every image and the center of the camera grid, and shifted the image by w * offset.

Below are results. We can see as we increase w and increase the shift, the focus comes closer to the camera, as expected.

Shifted w=0.5
Shifted w=0.4
Shifted w=0.3
Shifted w=0.2
Shifted w=0 (NO SHIFT)
Shifted w=-0.1
Shifted w=-0.2
Shifted w=-0.3
Shifted w=-0.4
Shifted w=-0.5

Changing Aperture

Another interesting function of the light field camera is the ability to mimic cameras of different apertures. Starting with the center of the camera grid, averaging images in a wider radius results in an image reminiscent of a camera with a larger aperture. This means the depth of field decreases.

Radius 1 (9 images averaged)
Radius 2 (25 images averaged)
Radius 3 (49 images averaged)
Radius 2 (81 images averaged)
Radius 1 (121 images averaged)
Radius 2 (169 images averaged)
Radius 1 (225 images averaged)
Radius 2 (289 images averaged)

Bells & Whistles: Dynamic Refocusing

The bell and whistle I implemented involved applying dynamic refocusing using the techniques we learned from project 1 and image alignment. I prompt the user to select a point, then extract a patch around that point, which I align by searching over different shifts and scoring the shifts with SSD (sum of squared differences). The patch I select is 60x60 pixels, which is small enough to be local to that depth, but not too small such that the alignment fails.

Selected Focus Region
Recovered focus
Selected Focus Region
Recovered focus
Selected Focus Region
Recovered focus