Lightfield Camera + A Neural Algorithm for Style Transfer

1. Depth Refocusing

The Stanford Light Field Archive provides us with datasets of images taken over a regularly spaced grid. Lets's first explore the effects of simply averaging all the images.

As we can see, the objects far away from the camera are sharp, because they do not vary their position significantly when the camera moves around (the optical axis direction is unchanged). In contrast, objects closer to the camera appear blurry.

To refocus depth, we can shift the images appropriately with respect to the center image, which is the image with coordinates (8, 8) on the (17, 17) grid. The filename of each image provides the location of their respective view. We subtract from it the location of the view of the center image, and now we have the amount of shift we need. We then introduce a parameter depth, using which we control the amount of the shift, and thus the depth we focus on. In the resulting image, this shows up as a refocused area corresponding to that depth.

Depth = -0.1	Depth = 0.0
Depth = 0.1	Depth = 0.2
Depth = 0.3	Depth = 0.4
Depth = 0.5

Here's a GIF that visualizes the change in depth of our focus:

More examples (GIF):

2. Aperture Adjustment

Another cool manipulation we can do here is to mimic a camera with a larger aperture by sampling images over the grid perpendicular to the optical axis. We choose a radius (e.g. 50) and sample all images within that radius of the center image. We then shift them appropriately, as what we did in section 1, and average the shifted images. Note that the base case r = 0 means we sample only the center image. Here are the results (setting depth=0.20 to focus on the center region):

r = 0	r = 10
r = 20	r = 30
r = 40	r = 50

More examples (GIF):

Bells & Whistles: Interactive Refocusing

Building on the refocus_im function implemented in Part 1, I also created a function that allows us to refocus on any point of the image. The function takes in the (u, v) coordinates of the point, which can be easily read from skio's image output. It then calculates the optimal depth for refocusing. For example, we have the following blurry image:

We can specify to the function that we'd like to refocus on the point at (1200, 700). Now the image is refocused!

Here's another demo: let's refocus on the point at (1000, 200).

3. Reflections

This is a really fun demo that showcases our abilities to "refocus" or adjust aperture after images were taken. The method was simple but the results were rather amazing. However, I do think the current approach is limited in that we need to have knowledge of the camera coordinates of the images. In addition, we used a total of 19 x 19 = 289 images for each scene, and this might not always be possible. It would be the most effective if we can develop algorithms to learn the camera coordinates and to generate these results with fewer samples. I plan to read up more on this.

A Neural Algorithm of Artistic Style

Reimplementation + Explorations

1. Introduction

A Neural Algorithm of Artistic Style by Gatys et al was a seminal work in the field of neural style transfer. On a very high level, the approach they introduced was to use CNNs to separate and recombine content and style. I am especially impressed by the novel approach and the aesthetic pleasure from the synthesized images. In this part of the final project, I reimplemented their approach while introducing some personal experimentation.

Here are links to the original paper and two PyTorch tutorials (PyTorch and d2l) that were helpful in guiding my implementation.

In order to use GPU compute, I set up my environment on Azure ML Notebook, utilizing some free credits I had from previous work (thanks Microsoft!). I also set up a project in Weights & Biases to log my results in each run. It turned out logging hyperparameters and synthesized images was extremely important - when I was stuck at some uninteresting results, looking at experiment data helped me figure out the correct parameters to choose.

Reading the paper, the most interesting part comes from realizing we are not training the neural network itself in the traditional sense. What is being optimized instead is the synthesized image. As discussed in the paper, there usually exists no such image that perfectly captures the content of one image and the style of the other. But as we minimize the loss function, the synthetic image we generated become more perceptually appealing.

2. The VGG Network

As described in the paper, we use a pre-trained convolutional neural network called VGG-19. The network is originally used for object recognition, surpassing many previous baseline results on ImageNet. This made the network suitable for our task - with its object recognition abilities, it can "capture the high-level content in terms of objects and their arrangement in the input image but do not constrain the exact pixel values of the reconstruction." We initialize the network with pre-trained weights from VGG19_Weights.IMAGENET1K_V1.

We use the VGG network to extract features from the content image and the style image. Using these "feature maps" we can calculate content and style loss.

3. Loss Functions

The content generation problem is easy - we define content loss using the MSE loss between the synthesized image's features and the original content image's features. Style features, on the other hand, is difficult. For each layer used in style representation, we need to compute the Gram matrix. The elements of the Gram matrix represent the correlations between the activations of different filters in the layer. These correlations capture the texture and visual patterns that are characteristic of the style of the image.

In addition to content and style loss described in the original paper, total variation loss (tv_loss) was also computed. This is meant to minimize the amount of high frequency artifacts in the synthetic image.

4. Training

In earlier training runs, I failed to get satisfying results even after 1,500 epochs. By examining my hyperparameters and the way the images changed, I noticed the content features were reconstructed well, but the stylistic elements were not showing. I assigned a heavier weight to style (1e6 instead of 1e3 or 1e4) and the model started generating interesting results.

5. Results

Content Images

Berkeley, CA	Golden Gate Bridge 1
Golden Gate Bridge 2

Style Images

The Starry Night, Vincent van Gogh	Impression, Sunrise, Claude Monet
Haystacks, Claude Monet	The Scream, Edvard Munch
Figure, Pablo Picasso	Journey to the East, Bukang Y. Kim

Input (Synthetic) Images

All of these are GIFs showing the gradual synthesis of the combined image. You might have to wait a bit to see the entire process!

Berkeley + Starry	Berkeley + Scream
Berkeley + Sunrise	Berkeley + Cubism
Bridge + Scream	Bridge 2 + Haystacks
Berkeley + Ink Wash

Lightfield Camera

Depth Refocusing and Aperture Adjustment with Light Field Data

Ron Wang

CS 180, Fall 2023, UC Berkeley

1. Depth Refocusing

Depth = -0.1

Depth = 0.0

Depth = 0.1

Depth = 0.2

Depth = 0.3

Depth = 0.4

Depth = 0.5

2. Aperture Adjustment

r = 0

r = 10

r = 20

r = 30

r = 40

r = 50

Bells & Whistles: Interactive Refocusing

3. Reflections

A Neural Algorithm of Artistic Style

Reimplementation + Explorations

1. Introduction

2. The VGG Network

3. Loss Functions

4. Training

5. Results

Content Images

Berkeley, CA

Golden Gate Bridge 1

Golden Gate Bridge 2

Style Images

The Starry Night, Vincent van Gogh

Impression, Sunrise, Claude Monet

Haystacks, Claude Monet

The Scream, Edvard Munch

Figure, Pablo Picasso

Journey to the East, Bukang Y. Kim

Input (Synthetic) Images

Berkeley + Starry

Berkeley + Scream

Berkeley + Sunrise

Berkeley + Cubism

Bridge + Scream

Bridge 2 + Haystacks

Berkeley + Ink Wash

Bells & Whistles: Shuffling Layers

Approach 1

Approach 2

Approach 1

Approach 2