CS 194 Final Project

Theodora Worledge

Project 1: Image Quilting

Part 1: Randomly Sampled Texture

A rough first attempt at synthesizing texture is to use randomly sampled patches. Below is the texture sample (smaller image) and the synthesized texture at twice the size of the sample (larger image). Not too surprisingly, this method does not work well because it does not preserve the structure of the texture.

weave_small2.jpg weave_random.jpg

Part 2: Overlapping Patches

This next method of synthesizing texture attempts to preserve structure. Starting from the upper left corner (moving left-to-right, top-to-bottom), patches are placed with a certain amount of overlap with the previously placed patches to the left and above. A patch is only chosen if its intersection with previously placed patches has an average L2 loss lower than a set threshold. In my implementation based off of Efros & Freeman 2001, this threshold is set to be within 10% of the lowest possible average L2 loss over all possible patches. A patch is chosen at random from all the patches with an average L2 loss under this threshold.

The result of this method is significantly better than that of the previous method. As before, the texture sample is the smaller image and this overlapping patches method was used to generate the large image at twice the size of the sample. The structure of the synthesized texture is now consistent with the sample. However, there are noticeable horizontal and vertical edge artifacts, due to the square patches used in this method.

weave_small2.jpg weave_overlap.jpg

Part 3: Seam Finding

This third method of synthesizing texture addresses the noticeable horizontal and vertical edge artifacts present in the previous method. This seam finding method builds off of the previous method, but instead of inserting patches with straight edges into the synthesized texture, it uses patches with optimal, non-straight edges. After choosing a patch to insert, I calculate a minimum cost path through the pixel-by-pixel L2 loss of the overlapping region between the chosen patch and the previously placed patches. This minimum cost path is then used as the boundary of the patch when it is inserted into the synthesized texture. This process is shown below: The first image is the overlapping region of the patch, the second image is the area of the synthesized image the patch will fill, and the third image is the L2 loss between the first and second images, with the minimum cost path in red.

error_im1.jpg error_im2.jpg error_path.jpg

The results from this method are quite good. The optimal seam finding appears to eliminate noticeable horizontal and vertical edge artifacts. Except for the last image of text, all of the sample textures are my own photographs.

weave_small2.jpg weave_quilt.jpg gochujang_small2.jpg gochujang_quilt.jpg tortilla_small2.jpg tortilla_quilt.jpg chalk_small2.jpg chalk_quilt.jpg yellow_small2.jpg yellow_quilt.jpg ytext_small.jpg text_quilt.jpg

Part 4: Texture Transfer

To implement texture transfer, I added another term to the loss used in choosing which patch to insert. This new term is defined as the average L2 loss between the patch and the patch of the target image that corresponds to the area in the synthesized image. The new loss used to choose the patch is defined as a convex combination of the original loss and this new term, parameterized by alpha and 1 - alpha, respectively.

I tried many combinations of target and texture photos. Below are some of my best results. The target and texture images are in the first two columns, respectively. The results are in the third and fourth columns. I created my handprint in chalk, my dog (Leo) in the crowd from this year's Big Game, and Leo again in chalk. My artistic favorite is Leo in chalk texture. I used an alpha value of 0.1 for the chalk textures and an alpha value of 0.3 for the crowd texture.

hand.jpg chalk_small2.jpg tt_hand.jpg leo_big.jpg crowd2.jpg tt_leo1.jpg tt_leo2.jpg

Bells & Whistles: Implemented my own min-cut algorithm

I wrote my own python implementation of a dynamic programming algorithm to calculate a min-cost path from the top to bottom of an array of arbitrary size. I ensured that my algorithm considers paths starting and ending from every cell on the top and bottom rows. I used this same algorithm for horizontal and vertical seams by transposing the cost matrix in the horizontal seam cases and accordingly transposing the coordinates of the resulting min-cost path.

Conclusion

This was a cool project because I didn't realize that texture synthesis and texture transfer could be done so seamlessly (ha ha) with only quilting together patches of the sample image. It was nice to work on this project before my second project, which moves beyond texture transfer to style transfer.

Project 2: A Neural Algorithm of Artistic Style

(Re)Implementation

In 2015, Gatys et al. introduced a method for style transfer by leveraging the ability of CNNs to distinguish the style of an image from its content. The implementation of style transfer minimizes a convex combination of a loss for style and a loss for content. Rather than updating the weights of the network during backpropagation, the pixels of an input image are updated to create the synthesized image.

Gatys et al. define the loss for style as the summed difference between the Gram matrix of the feature activations for the style image and the Gram matrix of the feature activations for the input image as below. By calculating these Gram matrices, we effectively capture correlations between the feature responses, which evidently captures style. A is the Gram matrix for the feature activations of the style image, a. G is the Gram matrix for the feature activations of the input image, x. N_l is the number of channels and M_l is the width * height of a feature activation map at layer l. The contribution of each layer l to the overall content loss is weighted by w_l.

style_loss.jpg

As below, Gatys et al. define the loss for content as the summed difference between the feature activations for the content image and those of the input image. P is the feature activation map for the content image p and F is the feature activation map for the input image x.

content_loss.jpg

The combined loss term is then calculated as below.

total_loss.jpg

In my reimplementation, I use a VGG-19 network with pretrained weights from PyTorch. Like Gatys et al., I calculate the loss for style from conv1_1, conv2_1, conv3_1, conv4_1, and conv5_1, weighting the contribution from each layer equally by 1/5. I calculate the loss for content from the filter activations in conv4_1. I used LBFGS as my optimizer. I define alpha and beta by choosing a style loss ratio, defined as beta:alpha. My hyperparameters fell into the following ranges: 300 to 800 epochs, learning rate of 0.5 or 1, and style loss ratio of 1e0 to 1e5.

Replicated Results

Here, I show my results on the style transfer tasks presented by Gatys et al. The first column contains the content image, the second column has the style image, and the third column has the result.

While the style colors always transferred well to the result, I struggled for a while to get the structure of the style into the result. I eventually realized that I was attaching the style and content loss modules to the wrong convolutional layers towards the start of the network. My theory is that these earlier layers have a smaller receptive field and so only allowed for more local changes, like color, rather than larger changes, like structure. After fixing this bug, my results improved; the second result of the Starry Night style and the third result of The Scream style are examples where the structure of the style comes through strongly in the result. The last result in this section of Composition 7 style is an example where the structure of the style is a little weaker. Overall, the results in Gatys et al. more successfully capture the structure of the style than my results. Reasons for this may include their use of different pretrained weights from the caffe-framework, longer training times, and more finely tuned style loss ratios.

neckarfront.jpg ship2.jpg ship_houses.jpg neckarfront.jpg starry_full.jpg starry_houses.jpg neckarfront.jpg scream.jpg scream_houses.jpg neckarfront.jpg picasso.jpg nude_houses.jpg neckarfront.jpg comp7.jpg comp7_houses.jpg

Original Results

Here, I show results run on new inputs. The first column contains the content image, the second column has the style image, and the third column has the result.

I really enjoyed this part of the project. I found that the style of Impressionism was a friendly style to work with; I used Vincent van Gogh's Starry Night and Wheat Field with Cypresses, as well as Monet's Impression, Sunrise. All of the content photos below are my own; the first is the coastline at Point Reyes, the second is a beach down in LA, the third is from a backpacking trip along the Merced River in Yosemite, the fourth is from a hike near Mt. Umunhum, and the fifth is of some sea anemones. I was very satisfied with how well the textures of the styles transferred to my photos. I think my personal favorite is the recreation of the sea anemones because they become the stars in Starry Night.

coast.jpg starry_full.jpg y_starry_coast.jpg sunset2.jpg monet.jpg y_monet_sunset.jpg merced.jpg starry_full.jpg y_starry_merced.jpg almaden.jpg wheat.jpg y_wheat_almaden.jpg anenome.jpg starry_full.jpg y_starry_anenome.jpg

Conclusion

I thought this was a really awesome project- I think I'm going to use it to make some Christmas presents for my parents. Conceptually, it amazes me that CNNs have the ability to separate content from style. I know the analogy between neural nets and the human brain can be hand-wavy at times, but this seems to illuminate a concept that is important for humans too- understanding content separately from style.

To whoever's reading this- I had a great time in this class. Thanks for a wonderful semester!