Final Project 1: Lightfield Camera

Sydney Karimi

Part 1: Depth Refocusing

Finite Difference Operator

To refocus the image, I iterated through the grid of images and shifted all of them towards the middle image, which was found at (8, 8), since the cameras were in a 17x17 grid. The file names provide us with the absolute positions of the camera, but to perform the shifts, we have to choose an image to shift towards. Then, we get the image we will be shifting, and its absolute positions. We subtract this from the absolute position of the center view, and scale this by c, which is a parameter we select and adjust to change the point of focus. These calculated shifts are our dx and dy, which we use to np.roll the image by in order to perform the transform. After we perform all of the shifts for each image, we take the average of all of these images, which returns the final view with the depth refocused.

Chess c=0 (the original averaged image)
Chess c=-.2
Chess c=.1
Chess c=.5
Chess c=.7
Chess
Truck
Flowers
Sphere

Part 2: Aperture Adjustment

To perform aperture adjustment, we use the shift function from the previous section to generate all of the images at a certain depth (determined by c). However, instead of taking the average of all 289 images, we instead pick a radius of images we want to average. I select the middle square of images of size radius x radius, and these are averaged to create the final image. From the results, we can see that a small radius simulates a small aperture, as there is only one view, similar to a pinhole camera, so everything is in focus. As the radius is increased, the aperture does too, creating a depth of field effect that causes elements not in the focus point to blur. This is the effect of the increasing number of images to average. As the radius becomes 8, we recover the original image from the previous section.

Truck c=-.35, radius = 1
Truck c=-.35, radius = 4
Truck c=-.35, radius = 7
Truck c=-.35, radius = 9
Chess
Truck
Flowers
Sphere

Summary

It was really cool to use the light field image data to simulate some of these effects. I took Professor Ren’s 184 class last spring, and while we touched on the light field cameras, this project really solidified the ideas he talked about in class. I think initially, I was confused on how to use the image file names as data for the depth refocusing, which made it difficult to start, but once I was able to extract and use that, the project was more fun than complicated. I enjoyed learning about this technology and how it ties to the plenoptic function that we talked about in class. It was pretty cool that we could take advantage of these minute differences in images and turn them into camera effects that are usually implemented by the hardware of a camera.

Final Project 2: Neural Style Transfer

To implement this, I created a StyleLoss module and a ContentLoss module that calculates the MSE loss for the style and content distances. While they are similar, the Style loss has an additional step that uses the normalized Gram matrix (the result of multiplying a matrix by its transpose). The only preprocessing we have to do on the images in resizing them to the same dimension and then making it into a Tensor.

Then, we import the VGG19 model with pretrained weights as the paper says, which is a CNN like the ones we’ve used previously in class that we will take layers from. We also import the normalization mean and standard deviation in order to normalize the image before putting it through the network. We built a sequential Module that that uses the ContentLoss and StyleLoss modules and get the loss from those. We also select certain content and style layers that we want to use, picking StyleLayers as Conv 1-5 and ContentLayers as Conv4, whose features we feel will best encode the style of the given image. These can be adjusted to give different results; one time i tried only using Conv4 and Conv5 for Style, and the results were not as good.

We use the LBFGS optimizer to perform gradient descent and update the optimizer on every step using the style and content losses multiplied by respective styleWeight=1000000 and contentWeight=1. Finally, when we run our network for 500 epochs, we get an output image with the transferred style

Models

Style Images

Content Images

Anime Landscape Outputs

Paris Painting Outputs

Starry Night Outputs

Wave Outputs

As you can see, the model does best transfering style onto the landscape photo, and not great on the simple cartoon. I think this was a fun project to implement and see how we can use Neural Networks in a more complex setting to produce beautiful results.