CS 194-26 Final Project: Seam Carving & Style Transfer

Author: Aivant Goyal

These are my final projects for CS 194-26. They include Seam Carving and Neural Style Transfer

Project 1: Seam Carving

In this project, we implement seam carving, an algorithm for content-aware resizing of images.

Usually, when resizing an image, you can expect it to either look warped or for parts of the image to be cropped. Wouldn’t it be nice if there was an algorithm that could find the least important pixels in an image and remove those for you? Well that is the goal of seam carving! The process is described in a paper called Seam Carving for Content-Aware Image Resizing.

Energy Function

A seam is a connected path from one side of an image to another that chooses exactly one pixel from each column/row. We want to remove the least important seam.

The first question is: what does it mean for a seam to be more or less important? We answer this using an energy function. First we compute the energies at all points in an image, and then we find the lowest cost path (or seam) through the image. Once we have our seam, we can remove it!

In my project, I decided to use a color gradient energy function that compared the square distance of the colors in an image to the neighboring pixels. Basically, the more similar a pixel is to its neighbors, the lower the energy. This way, the lowest-cost seam would hopefully take out pixels that are all relatively similar to one another.

I implemented horizontal seam carving, but I was able to accomplish vertical seam carving by transposing the image and finding the lowest-cost horizontal seam.

Results

While the algorithm is slow, as it has to recalculate the energies each iteration, I created a function that could take in an image and effectively resize it by using both vertical and horizontal seam carving. Below are some of my results!

Landscape

Original Size: 800x278
New Size: 500x278

House

Original Size: 512x384
New Size: 312x384

Person

Original Size: 244x301
New Size: 230x230

Sidewalk

Original Size: 375x500
New Size: 375x300

The results look really good!

Extras

Using the same idea, we can actually add new seams to an image as well. Once we find a seam, instead of removing it, we can insert a new seam that averages the nearby pixels.

I used this concept to let my resize function resize an image to any new arbitrary size. Below are some results where the image was shrunk on one axis and stretched on another.

Ocean

Original Size: 254x152
New Size: 200x200

Train

Original Size: 541x360
New Size: 600x300

Louvre!

Original Size: 472x600
New Size: 550x550

It worked pretty well, especially for images that have large stretches of similar patches (sky, ocean, etc. ), but on busier images, it’s easy to notice that the algorithm ends up duplicating one seam over and over again to stretch the image. I think this is likely because of my particular energy function. With a different energy function that doesn’t solely rely on color gradients, we might find more success

Neural Style Transfer

In this project, we reimplement A Neural Algorithm of Artistic Style, a paper that describes how we can use the intermediate feature steps of a trained CNN to transfer the style of one image to another.

This allows us to take a painting or image with a distinct style, like The Starry Night by Vincent Van Gogh, and apply its unique style to any other image, like so:

Paper Overview

Neural Networks are often referred to as “Black Boxes”, because it’s nearly impossible to derive meanings from the weights. Convolutional Neural Networks, however, are a little different; they train to find the best convolution matrices to run over the images that will help create the desired output. These convolutions, consequently, often end up representing various pattern detection like edges and textures. We can make use of the outputs of these convolutions to represent the style of an image

Additionally, as the image is processed using these convolutions, the indermediate versions of the images are no longer simply pixel values, but rather high level object representations, aka the content of an image.

The paper uses the pretrained weights from the VGG-19 CNN and describes methods to separate the content and style portions of an image. We then use these separated representations to synthesize a new image using the content of one image and the style of another!

Model Architecture

The VGG-19 Model Architecture is shown below. For the content Representations, I used layer 21, for the style representations I used features from layers: 0, 5, 8, 19, and 28. These were derived by looking at the paper and finding the corresponding Convolutional layers in the architecture.

We then take a target image (which starts off as the input image), and let the Adam optimizer adjust it so to minimize the content and style loss. We weight the style loss “10e10” and the content loss “1”. This difference is to make sure the algorithm prefers the style features of the input style over the original style.

To calculate the style loss, we calculate a Gram Matrix for each style feature and compare between the target image and the style image. To calculate the content loss, we simplly take the average of the difference between the content representation of the target and original image.

Hyper Parameters:

The paper finds that a style loss ratio of 10^5 vs. 10^0 produces very muddled results, and a ratio of 10^2 and 10^0 is much more reasonable. I found that the style would not train fast enough (Even with 10,000 epochs) at their ratio, so i altered it.

VGG-19 Model:

Examples

Below are various examples of style transfer with varying degrees of success!

Original Images

Title Image
Neckarfront
Obama
Dancer
Campanile

Style Images

Title & Author Image
Abstract Design, source unknown
Femme nue assise by Pablo Picasso aka. Picasso
Der Schrei by Edvard Munch aka. The Scream
Generic Space Picture, source unknown
The Starry Night by Vincent van Gogh

Neckarfront

Title Style Result
Abstract Neckarfront (Epoch 5000)
Picasso Neckarfront
Scream Neckarfront (Epoch 4000)
Space Neckarfront (Epoch 5000)
Starry Neckarfront

Dancer

Title Style Result
Abstract Dancer
Picasso Dancer
Scream Dancer
Space Dancer
Starry Dancer

Obama

Title Style Result
Abstract Obama
Picasso Obama (Epoch 4000)
Scream Obama
Space Obama (Epoch 5000)
Starry Obama

Campanile

Title Style Result
Abstract Campanile
Picasso Campanile
Scream Campanile
Space Campanile
Starry Campanile

Gifs

Some gifs showing the transition process at various epochs!

Abstract Obama Starry Campanile

Failure Cases

The algorithm did not work great for everything. I think the most notable failure case was on the obama image.

Space Obama (Epoch 10000) Picasso Obama (Epoch 10000)

These images largely took on the entirety of the style image, especially when run for the full 10,000 epochs (but even when not, see above for shorter runtime.) I think this is because, unlike the others, this image was a portrait, which had a different object representation than the others, which were mostly landscapes

Specifically for the space image, all across the board it didn’t transfer as well because there wasn’t a real texture to the image, so the algorithm felt like it was just overlaying the space picture with the original

Conclusion

This was a very fun set of projects and a very fun class! I’m proud of my outputs and I’m glad I got a chance to take it :)