CS194-26 Final Project

Authors: Yibin Li (cs194-26-ads) and Violet Yao (cs194-26-afs)

Part I: Lightfield Camera

As this paper by Ng et al. demonstrated, capturing multiple images over a plane orthogonal to the optical axis enables achieving complex effects using very simple operations like shifting and averaging. The goal of this project is to reproduce some of these effects using real lightfield data.

In this project, we took some sample datasets from the Stanford Light Field Archive, where each dataset comprising of 289 views on a 17x17 grid.

1) Depth Refocusing

The objects which are far away from the camera do not vary their position significantly when the camera moves around while keeping the optical axis direction unchanged. The nearby objects, on the other hand, vary their position significantly across images. Averaging all the images in the grid without any shifting will produce an image which is sharp around the far-away objects but blurry around the nearby ones. Similarly, shifting the images appropriately and then averaging allows one to focus on object at different depths.

In this part of the project, we implement this idea to generate multiple images which focus at different depths. To get the best effects, we use all the grid images for averaging.

Below are averages of chess shifted to five different depths:

Or in gif format:

One more example:

2) Aperture Adjustment

Averaging a large number of images sampled over the grid perpendicular to the optical axis mimics a camera with a much larger aperture. Using fewer images results in an image that mimics a smaller aperture. In part2, we generate averages of images filtered by different radius while focusing on the same point, which corresponds to different apertures.

Below are averages of chess filtered by five different radius:

Here are the results:

3) Summary

We are amazed by the cool result above with only a chunk of code.

Part 2: Style Transfer

Overview

In this part, we reimplement neural style transfer discussed in this paper, where we take in a content image and a style image and output a blended image painted in the style of the style image but still keep the content of the content image.

Method

We use the same architecture and method as proposed in the paper. The VGG19 netwrok is used to extract feature information from the images. The style representations are from layer Conv1_1, Conv2_1, Conv3_1, Conv4_1, and Conv5_1, while the content representation is from layer Conv4_2 of the original VGG network.

Model Architecture (VGG19)

Sequential(
  (0): Conv2d(3, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  (1): ReLU(inplace=True)
  (2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  (3): ReLU(inplace=True)
  (4): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
  (5): Conv2d(64, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  (6): ReLU(inplace=True)
  (7): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  (8): ReLU(inplace=True)
  (9): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
  (10): Conv2d(128, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  (11): ReLU(inplace=True)
  (12): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  (13): ReLU(inplace=True)
  (14): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  (15): ReLU(inplace=True)
  (16): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  (17): ReLU(inplace=True)
  (18): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
  (19): Conv2d(256, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  (20): ReLU(inplace=True)
  (21): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  (22): ReLU(inplace=True)
  (23): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  (24): ReLU(inplace=True)
  (25): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  (26): ReLU(inplace=True)
  (27): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
  (28): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  (29): ReLU(inplace=True)
  (30): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  (31): ReLU(inplace=True)
  (32): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  (33): ReLU(inplace=True)
  (34): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  (35): ReLU(inplace=True)
  (36): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
)

Hyperparameter:

Iteration for training the network: 5000

Adam optimizer with learning rate of 0.003

Content loss weight (alpha) = 1

Style loss weight (beta) = 1e6

Show Result

In this section, we choose several different content images and style images. Based on our trained model, we presents their style transfer result. First two of the results are fantastic style transfer examples, but the last pair doesn't seem to work.

We suspect that there are two reasons why it fails. First, the highly correlated style that content image and style image share. To provide with some background information, both of the images in the last pair are from abstract art genre. Their style are rather comlicated and abstract even for a human to describle, not to mention a neural algorithm. Moreover, their content seem to be both meaningless and confusing. Human recognize a image by identifying the content in the image. Thus, if the content in the first place is unrecognizable, the content certainly will be even harder to measure in the transfered result.

Successful style transfer png

png

The failed style transfer

png

Transfer Neckarfront to different styles

Here, we transfer the Neckarfront to 6 styles as illustrated in the paper. It is clear that all Neckarfront content are still visible in the final result with some styles from the original style image. However, the style transformation is not as dramatic as shown in the paper. One reason could be the short training iteration in our implementation. If we train the model longer, the style of the final result will be closer to the style image.

png

png

png

png

png

Part III: Seam Carving

Seam Carving for Content-Aware Image Resizing discusses shrinking an image vertically or horizontally to a given dimension automatically while keeping the "important" content of the image.

In general, the algorithm works as follow:

  1. Compute the energy alue for every pixel
  2. Find a vertical or horizontal path of the pixels with the least energy
  3. Delete all pixels in that path and reshape the image
  4. Repeat 1-3 till the edesireed number for rows and columns is reached

Energy Function

The most important thing in the ealgorithm is the energy function. The original paper proposed several energy function; we used the most basic one: sum of partial derivative. Specifically, for each pixel in each channel, we will compute the partial derivative in the x-axis and the partial derivative in the y-axis. Then, we will sum their absolute value together. That's it! Mathmatically, it can be described as

\(E(i) = |\frac{d}{dx}i| + |\frac{d}{dy}i|\)

where i is the pixel and E(i) is the energy value for that pixel.

Let's examine the energy map for the white bear image.

Finding Path

We could use dynamic programming to repeatedly remove least important seams in the image until it reaches the desirable dimension. We will store the curreent minimum result to a matrix M, which has the same shape as the image energy map. Then, finding the minimum value in the last row will essentially find the path that need to be deleted. Repeat this step untile the desired size reached.

Result

Note: pictures enlarged to show details.

Here are the success results for both horizontal and vertical carving:

Failure case:

Sad to see campnile distorted ;(((

Bells & Whistles: Stretch images using seam insertion

Seam Insertion is the inverse of seam carving. The idea is therefore very similar. We first make a copy of the original image and perform the seam carving with desired size. Record all coordinates while performing the seam carving. Then, we insert new seam on the target image with the same order. The inserted artifical seam is computed by the average of the right and left seam.

What we learned

We learned that seam carving does not work well with images that have a strong pattern, the distorted campnile for example. Determining importance of pixels by energy function is also fascinating to us because of its simplicity and intuitiveness.

Thank you for a great semester! :)))