Here, we visualize the learned content of the VGG-19 network at different layers for starry night:

conv1_1	conv2_1	conv3_1
conv4_1	conv5_1

Here, we visualize the learned style of the VGG-19 network at different layers for starry night:

conv1_1	conv1_1 + conv2_1	conv1_1 + conv2_1 + conv3_1
conv1_1 + conv2_1 + conv3_1 + conv4_1	conv1_1 + conv2_1 + conv3_1 + conv4_1 + conv5_1

As per the paper, I replaced the max pool layers in VGG-19 with average pools. Moreover, I used LBFGS as the optimizer for gradient descent. I also experimented with ADAM, but LBFGS seemed to give the best results. I used a default learning rate of 1.0 and achieved good results.

Another change I made as opposed to the original paper is that I changed the weights for the style loss contributed from each layer. Namely, in the paper the loss is weighted equally for each layer, where as I tinkered with these ratios for the images I generated to achieve better results.

Here are a variety of results that my network produced. For all of these images, the content loss was based on the output of conv4_1, and the style loss was based on conv1_1 to conv5_1:

Content	Style	Result
Content	Style	Result
Content	Style	Result
Content	Style	Result

Bells and Whistles

A followup to the above paper addressed the issue of the colors in the generated image matching the colors of the style image. Sometimes, we want the newly generated image to match the color scheme of the content image. This was adressed in this paper. For my bells and whistles, I decided to implement one of the approaches for transferring color.

One approach to transfer color is to apply a linear transformation to each picture in the style image to change its color. Then, this newly colored style image can be passed through the base generating algorithm we used for the first part of the project. The methodology to generate this linear transformation is subject to a few constraints: After the transformation, we want the new mean vector (colors) to match the mean color vector of the content image, and we want the new pixel (color) covariance matrix to match the pixel color covariance matrix for the content image. There are a few ways to do this, but one is done by the following:

I implemented the computation of this matrix and applied it to transfer colors between images. Here are some results:

Here are a variety of results that my network produced. For all of these images, the content loss was based on the output of conv4_1, and the style loss was based on conv1_1 to conv5_1:

Image to be Colored	Color Source	Result
Image to be Colored	Color Source	Result

Finally, I generated images using the color transfer. Here are some results:

Here are a variety of results that my network produced. For all of these images, the content loss was based on the output of conv4_1, and the style loss was based on conv1_1 to conv5_1:

Content	Style
Result without Color Transfer	Result with Color Transfer
Content	Style
Result without Color Transfer	Result with Color Transfer

Varying Depth

The first step of this project was to use lightfield camera data from the Stanford Light Field Archive to focus images in different regions. This was done by first finding the center image, and then shifting every other image towards the center image by an amount proportional to its distance from the center image. We call this constant c. Averaging all images after this shift produces out final result. When c = 0, the result is an image that is focused for distant objects, and blurry for nearby ones. This is because nearby object are more sensitive to slight shifts of camera, and thus become blurry when averaging.

Here are some results for different values of c:

c=0	c=0.1	c=0.2
c=0.3	c=0.4	c=0.5

Varying Aperture

The second step of this project was to use lightfield camera data from the Stanford Light Field Archive to simulate different apertures. This was done by fixing some value of c, and then limiting the set of images we average over to be those within some radius r of the center image. By doing this, we artificially change the amount of blur on the edges of the image, which thus simulates a change in aperture.

Here are some results for different values of r:

r=0	r=10	r=20
r=30	r=40	r=50

CS194-26: Intro to Computer Vision and Computational Photography, Fall 2021

Final Projects: Neural Style Transfer and Lightfield Camera

Instructors: Alexei (Alyosha) Efros, Angjoo Kanazawa

Matthew Lacayo, CS194-26

Neural Style Transfer

Bells and Whistles

Lightfield Camera

Varying Depth

Varying Aperture