CS194-26: Intro to Computer Vision and Computational Photography, Fall 2021

Final Projects: Neural Style Transfer and Lightfield Camera

Instructors: Alexei (Alyosha) Efros, Angjoo Kanazawa

Matthew Lacayo, CS194-26



Neural Style Transfer

The goal of this project was to implement the work in the paper A Neural Algorithm of Artistic Style.

The main insight of the paper is that a convolutional neural network such as VGG-19 learns to disentangle the content of an image from the style of the image. As an example of this, we can attempt to visualize what a convolutional network learns at a given layer. The technique for doing this is rather clever: we can claim that if two different images that are fed into the neural network produce the same output at some layer, then they should represent the same content. Clearly, as we go into deeper layers that occur after maxpools, this information will start to decay slightly, and images that slightly differ in content will still achieve a similar output at that layer. For measuring style, we can use the gram matrix, and then claim that two images are in the same style if the neural network's output at a particular layer has the same gram matrix for both images. We can formulate the finding of such images as an optimization problem where we jointly minimize over the so called "Content Loss" and "Style Loss" with gradient descent.

Here, we visualize the learned content of the VGG-19 network at different layers for starry night:

conv1_1
conv2_1
conv3_1
conv4_1
conv5_1

Here, we visualize the learned style of the VGG-19 network at different layers for starry night:

conv1_1
conv1_1 + conv2_1
conv1_1 + conv2_1 + conv3_1
conv1_1 + conv2_1 + conv3_1 + conv4_1
conv1_1 + conv2_1 + conv3_1 + conv4_1 + conv5_1

As per the paper, I replaced the max pool layers in VGG-19 with average pools. Moreover, I used LBFGS as the optimizer for gradient descent. I also experimented with ADAM, but LBFGS seemed to give the best results. I used a default learning rate of 1.0 and achieved good results.



Another change I made as opposed to the original paper is that I changed the weights for the style loss contributed from each layer. Namely, in the paper the loss is weighted equally for each layer, where as I tinkered with these ratios for the images I generated to achieve better results.



Here are a variety of results that my network produced. For all of these images, the content loss was based on the output of conv4_1, and the style loss was based on conv1_1 to conv5_1:

Content
Style
Result
Content
Style
Result
Content
Style
Result
Content
Style
Result

Bells and Whistles

A followup to the above paper addressed the issue of the colors in the generated image matching the colors of the style image. Sometimes, we want the newly generated image to match the color scheme of the content image. This was adressed in this paper. For my bells and whistles, I decided to implement one of the approaches for transferring color.



One approach to transfer color is to apply a linear transformation to each picture in the style image to change its color. Then, this newly colored style image can be passed through the base generating algorithm we used for the first part of the project. The methodology to generate this linear transformation is subject to a few constraints: After the transformation, we want the new mean vector (colors) to match the mean color vector of the content image, and we want the new pixel (color) covariance matrix to match the pixel color covariance matrix for the content image. There are a few ways to do this, but one is done by the following:

I implemented the computation of this matrix and applied it to transfer colors between images. Here are some results:

Here are a variety of results that my network produced. For all of these images, the content loss was based on the output of conv4_1, and the style loss was based on conv1_1 to conv5_1:

Image to be Colored
Color Source
Result
Image to be Colored
Color Source
Result

Finally, I generated images using the color transfer. Here are some results:

Here are a variety of results that my network produced. For all of these images, the content loss was based on the output of conv4_1, and the style loss was based on conv1_1 to conv5_1:

Content
Style
Result without Color Transfer
Result with Color Transfer
Content
Style
Result without Color Transfer
Result with Color Transfer

Lightfield Camera

Varying Depth

The first step of this project was to use lightfield camera data from the Stanford Light Field Archive to focus images in different regions. This was done by first finding the center image, and then shifting every other image towards the center image by an amount proportional to its distance from the center image. We call this constant c. Averaging all images after this shift produces out final result. When c = 0, the result is an image that is focused for distant objects, and blurry for nearby ones. This is because nearby object are more sensitive to slight shifts of camera, and thus become blurry when averaging.

Here are some results for different values of c:

c=0
c=0.1
c=0.2
c=0.3
c=0.4
c=0.5

Varying Aperture

The second step of this project was to use lightfield camera data from the Stanford Light Field Archive to simulate different apertures. This was done by fixing some value of c, and then limiting the set of images we average over to be those within some radius r of the center image. By doing this, we artificially change the amount of blur on the edges of the image, which thus simulates a change in aperture.

Here are some results for different values of r:

r=0
r=10
r=20
r=30
r=40
r=50