Final Project

Beom Jin (Brian) Lee



Project 1: Reimplement A Neural Algorithm of Artistic Style

Overview

In this project, we reimplement the ground-breaking paper, A Neural Algorithm of Artistic Style by Gatys et al, to transfer the style from a style image to a target content image using a Neural Network (VGG-19). Here, we take an image an reproduce it with a new artistic style. In particular, the paper uses the following drawings as its style images:

Seated Nude
Starry Night
The Scream

The paper uses the following image as its content image:

Neckarfront

Implementation Details

Overview: We use the pre-trained VGG-19 architecture. The main principle is simple, we define two distances, \(D_C\) and \(D_S\). \(D_C\) measures how different the content is between content image and the produced image and \(D_S\) measures how different style is between content image and the produced image. The produced image, then, attempts to minimize both \(D_S\) and \(D_C\), weighted.

Content Loss \(D_C\): Content loss is the average of the difference between the content representation of the target (produced) and original image

Style Loss: We compute the Gram Matrix for each style feature and compare between the target image and the style image

Total Loss: We compute the content loss and style loss, and weight them accordingly. In each of the pictures below, we use different ratio of content-to-style loss, and outline them in the results. Typically, we found that letting style weight to be \(10^6\) and content weight to be \(1\) worked best.

Network: We use the VGG-19 architecture, with some changes. Specifically, Layer 21 was used for content Representations, and layers 0, 5, 8, 19 and 28 were used for the style representations. This was done using the original paper. There is no difference in the network itself between the main paper and our network.

VGG19 Architecture

Hyperparameters: We use learning rate of \(3 \cdot 10^{-3}\), \(1000\) epochs with varying style and content-loss hyperparameters. Style and content loss hyperparameters differ from the paper as the paper recommends mostly \(1\) and \(100\). However, given resource constraints, we pushed the ratio up so that training happens more quickly with more vivid results.

Optimizer: LGBFS

Results

Results from Paper

We first show the results of transferring the content image of Neckarfront (above) to 3 selected styles listed in the paper. Our results do not look as similar to the style picture as those in the paper, primarily because of lack of training, compared to those in the paper. With longer training time, given that the network used are the same, we expect to achieve similar-quality results. However, whether we should retain more "style" or more "content" is an artistic choice, and on photos like the Seated Nude, we argue our results look better, as the original paper's results overly muffle the source image. On the other hand, the result from paper is better on Starry Night, and it captures the Starry Night on the skies, whereas in our results, we see traces of the "sky" in the buildings.
Result from Paper (Seated Nude)
1:10000 content-style ratio on Neckarfront
1:1e6 content-style ratio on Neckarfront
Result from Paper (Starry Night)
1:1000 content-style ratio on Neckarfront
1:1e6 content-style ratio on Neckarfront
Result from Paper (Scream)
1:300 content-style ratio on Neckarfront
1:10000 content-style ratio on Neckarfront

Good Results

We use our own selection of photos, and some selected styles, to create beautiful style transfer results.
Scream and Berkeley
Style (Scream)
Content (UC Berkeley)
1:10000 content-style ratio on Berkeley
1:1e6 content-style ratio on Berkeley
Starry Night and Berkeley
Style (Starry Night)
Content (UC Berkeley)
1:3000 content-style ratio on Berkeley
1:10000 content-style ratio on Berkeley
Seated Nude and Berkeley
Style (Seated Nude)
Content (UC Berkeley)
1:3000 content-style ratio on Berkeley
1:10000 content-style ratio on Berkeley
Garden and Berkeley
Style (Garden)
Content (UC Berkeley)
1:1000 content-style ratio on Berkeley
Nude Prof. Efros
Style (Seated Nude)
Content (Prof. Efros)
Nude Efros? (1:1000 content-style ratio)

Failure Yet Still Okay-Looking Cases

Here are some failure, yet okay-looking cases, where we took the failure cases in section below, and tuned the hyperparameters until it looked decent. We will analyze why they failed in the section below.
Berkeley in Waters?
Style (Great Wave)
Content (UC Berkeley)
1:1000 content-style ratio on Berkeley

Failure Cases

Berkeley in Waters?
One reason for the failure case below is lack of tuning. However, 1:2000 ratio typically gave me very content-oriented results. In this case, not so. We can combine this with the "okay"-looking result above, and the issue is that "Great Waves" style has too strong a texture of the waves. In fact, above, it recognized a lot of objects, and re-styled it with waves, which is not what we expect.
Style (Great Wave)
Content (UC Berkeley)
1:2000 content-style ratio on Berkeley
Screaming Prof. Efros
One reason for the failure case below is that face and style did not go well together, which has a different object representation than the others, which were mostly landscapes. In the case of Seated-Nude, it is still some representation of a human, but in case of the Scream, most of the texture is in its background.
Style (The Scream)
Content (Prof. Efros)
1:1000 content-style ratio on Prof. Efros

Project 2: Seam Carving

Overview

In this project, we explore seam carving, which was first explored in the groundbreaking paper, Seam Carving for Content-Aware Image Resizing. Seam carving is an algorithm for content-aware image resizing . What does this mean? When we want to resize an image, we often do so by squeezing or expanding the image, or cropping it. These lead to suboptimal image results. Seam carving is content-aware, so it finds areas/paths of least importance and automatically removes them.

Implementation Details

Overview of Implementation

To implement Seam Carving, we have two main ideas: energy function and DP algorithm to find the seam with least energy.

Energy Function: The purpose of the energy function is to define a metric of which regions in an image are not important/important, so that we can carve them out. Intuitively, pixels that blend in with their surroundings are unimportant and should have low energy values. The original paper proposes gradient magnitude as the energy function: $$e(I) = | \frac{\partial}{\partial x} I | + | \frac{\partial}{\partial y} I|$$ What does this mean? This equation tells us that, for every pixel in the image at every channel, we find the partial derivate in the \(x\)-axis and \(y\)-axis, and sum the absolute values. This brings us to the point: how do you compute the derivative of an image? We discussed multiple ways in class, and I experimented with finite difference operators and Sobel operator (recommended in the paper). They give similar results, but Sobel gives stronger results. We used Sobel operator, where I convolve the image with: $$\begin{bmatrix} +1 & +2 & +1 \\ 0 & 0 & 0 \\ -1 & -2 & -1 \end{bmatrix}$$ for the \(y\)-axis and $$\begin{bmatrix} +1 & 0 & -1 \\ +2 & 0 & -2 \\ +1 & 0 & -1 \end{bmatrix}$$ for the \(x\)-axis.

For example, below we show results of the Sobel filter for the classic Castle picture:

Original Photo
Energy Map

Finding the Optimal Seam: To find the optimal seam, we use a DP algorithm outlined in the paper to do so. We traverse the image from second to last row and ccompute the cumulative minimum energy for all connected seams, and fill in the DP table. Specifically, our recurrence is: $$M(i, j) = e(i, j) + \min (M(i - 1, j-1), M(i-1, j), M(i -1, j + 1))$$ where \(M\) represents the DP table of all minimum energy.

Vertical Seam Carving: The examples we show above are mostly horizontal seam carving. Then, how can we take the ideas to do vertical seam carving? It turns out the idea is very simple: transpose the image, run horizontal seam carving onto the transposed image and transpose it back.

Results

Horizontal Seam Carving

Below, we show the results of horizontal seam carving on several images, and credited:
Original Photo - Castle (link to original photo)
Crop (80%)
Crop (60%)
Original Photo - Water (link to original photo)
Crop (80%)
Crop (50%)
Original Photo - Oski (link to original photo)
Crop (80%)
Crop (60%)
Original Photo - I miss Berkeley (link to original photo)
Crop (80%)
Original Photo - Prof Efros (link to original photo)
Crop (80%)
Original Photo - The Clones (link to original photo)
Crop (80%)
Original Photo - Avengers (link to original photo)
Crop (80%)

Vertical Seam Carving

Below, we show the results of vertical seam carving on some images, and credited:
Original Photo - Yocouch (link to original photo)
Crop (80%)
Crop (60%)

Bells and Whistles: Seam Insertion

On top of deleting seams, we can also insert seams, by modifying the removal process. We follow the algorithm described in the main paper.

Implementation Details

Let us say we want to enlarge an image by \(k\), then, we find \(k\) seams for removal. Then, we add them back into the picture by averaging the pixel values across the seam.

Results

Original Photo - Castle (link to original photo)
Seam Insertion (120%)
Original Photo - Water (link to original photo)
Seam Insertion (120%)

Failure Cases

Below, we show examples of some failed cases and explanations of why.
Original Photo (link)
Cropped (80%)
In this example, we can intuitively reason that the photo decided the background had more energy, after all, there are lots of buildings and do not blend in with surroundings due to their edges. On the other hand, on Prof. Efros' face, some parts of the face blend well to each other, especially since the face is at the foreground and is large! In fact, take a look at the energy map below:
Original Photo (link)
Energy Map
The above energy map shows that indeed, there is more energy around the buildings.

Main Takeaway

It's surprising that such a simple DP algorithm does so much magic to resize images, and this project tied up a lot of loose ends of CS 194-26, where I was reminded by one of the first few projects with the energy function, to DP algorithms. This was a delight to play with!