CS 194-26 Final Project: Neural Algorithm of Aristic Style

Ronak Laddha

Data Preprocessing

In this part, I did some simple image preprocessing to read in the images and transform them before feeding them into the model. Namely, the preprocessing consisted of:

  • Resizing the artwork such that it had the same size as the photograph
  • Converting both images to tensors
  • Adding a "dummy" dimension to the tensors because VGG requires 4 input dimensions (batch_size, input_channels, h, w)
  • Above, you see the original image that we will be applying various art styles to. Those styles are highlighted in the artworks below:

    CNN Model Architecture

    Instead of creating a CNN from scratch, I made use of a pre-trained VGG-19 model (as per the Gatys, Ecker & Bethge paper - A Neural Algorithm of Artistic Style. Following that paper, I replaced all the MaxPooling with AvgPooling to improve the gradient flow and better the final quality of the image. In addition, I removed all the fully-connected layers such that there were only convolutional and pooling layers.

    Now, for the unique insights from the paper. The paper takes advantage of two key pieces of information that are derived from different stages of the CNN. Namely, content & style reconstructions.

    Content Loss

    The content reconstruction attempts to reconstruct the input image from the convolutional layers. The reconstruction from the lower levels is almost perfect, while more detailed (pixel-specific) information is lost in the higher levels. Each layer has N distinct filters with N feauture maps, each of size M (where M = h*w of the feature map). Thus, we can define a matrix F such that F[i,j] represents the activation of the ith filter at position j in a specific layer. We can construct this feature representation for both the original photo (P) and the generated photo (F). As part of this construction, we attempt to minimize the squared error loss between the F and P matrices. This leads to a new component in the neural network that needs to be included after relevant convolutional layers - the Content Loss.

    The paper only highlights the fourth convolutional layer as the layer at which we compute the content loss. However, I found that applying the content loss at convolutional layers 3, 4, and 5 yielded the best results.

    Style Loss

    The style reconstruction attempts to build a new feature space that captures the image style on top of the CNN representation. The style representation computes correlations between different features at different convolutional layers of the CNN. The feature correlations are given by a Gram matrix G of size NxN. G can be calculated by taking the inner product of the vectorized feature maps.

    Now, to generate the texture that matches the style of the image I used Gradient Descent on the original image. This is done by minimizing the mean squared distance between entries of the Gram matrix from the original style representation A and the Gram matrix of the generated style representation G - the Style Loss.

    The paper highlighted all the subsets of the 5 convolutional layers as layers on which to compute the style loss. By trial-and-error, I found that applying the style loss after all the convolutional layers yielded the best results.

    Personal Touches

    In summary, the modification made to the basic VGG-19 model are the addition of the Content Loss and Style Loss layers after the specified convolutional layers and the replacement of the MaxPool pooling layers with AvgPool pooling. As I was researching the VGG-19 model to better understand how it worked, I realized that the pre-trained VGG-19 model is trained on images where it normalizes images with the following means and standard deviations: mean=[0.485, 0.456, 0.406] & std_dev=[0.229, 0.224, 0.225]. In an attempt to standardize this normalization procedure, I added an additional layer to the network that normalizes the new input images in this same manner. Finally, as I was making these changes to the network, I realized that after I'd added all the Content and Style Loss layers, I didn't need anymore convolutional or pooling layers. So, to reduce the complexity and size of the model, I removed all the subsequent layers to yield the final model architecture shown below:

    Model Outputs

    To make a visually appealing final output, you need to find a good balance between a strong emphasis on style vs content. If you emphasize style too much, you will get an image that matches the appearance of the artwork (essentially yielding a texturized version of it). But, at the cost of hardly showing the photograph's contents. On the other hand, if you emphasize content too much, you can clearly identify the photograph but the style of the painting won't be well matched. This tradeoff is highlighted in Figure 3 of the paper, I used the figure to determine the values of alpha and beta in the loss function that combine the Content and Style Loss. This loss function is minimized during the image synthesis and is a weighted sum of the Content and Style losses where alpha is the weight on Content and beta on Style loss. One key point to note is that there is an additional weight term associated with the Style loss - it is set equal to 1/5 for active style layers (those with nonzero weight).

    There weren't too many hyperparameters that needed to be tuned for the model. The ones that needed to be tuned were the number of iterations and alpha/beta ratio. The paper suggested running the model for 1000 iterations, but I noticed that the loss function converged before then and that running the model for 300 iterations led to sufficient results. The alpha/beta ratio required some additional tuning, so I used Figure 3 from the paper as a guide for this. I started with an alpha/beta ratio of 1e-8 because smalller ratios didn't apply enough of the style to the photograph, and increased it to 1e-12. Increasing the ratio beyond this made it difficult to discern the contents of the photograph. The results are highlighted below:

    Neckarfront in Tubingen, Germany x The Shipwreck of the Minotaur

    Neckarfront in Tubingen, Germany x Starry Night

    Neckarfront in Tubingen, Germany x The Scream

    Neckarfront in Tubingen, Germany x Femme nue assise

    Neckarfront in Tubingen, Germany x Composition VII

    Custom Model Inputs & Outputs

    I generated outputs based on a custom input image as well, and also tested transferring the style from new pieces of arts. I used an alpha/beta ratio of 1e-10 because that seemed to be the best middle ground between maintaining the photograph's content and the artwork's style in the generated outputs above.

    Hanakapiai Beach in Kauai, Hawaii x Water Lilies (Monet)

    Hanakapiai Beach in Kauai, Hawaii x Persistence of Memory (Dali)

    Hanakapiai Beach in Kauai, Hawaii x Guernica (Picasso)

    CS 194-26 Final Project: Image Quilting

    Ronak Laddha

    Background

    This project consisted of two pieces: texture synthesis and texture transfer. Texture synthesis entails taking a small texture and repeatedly sampling it to quilt together and synthesize a larger texture. Texture transfer involves rendering an object with the texture taken from a different image. The procedure for this is described in the Efros & Freeman paper - Image Quilting for Texture Synthesis and Transfer.

    Randomly Sampled Texture

    The goal in this section is to randomly sample patches from a sample to generate an output image of some specified size. The process to do so is as follows:

      1. Start from upper left corner
      2. Randomly choose patches from sample & tile them in output until image is full.
        - If patches don't fit evenly into output image, leave black borders @ edges
      3. Save result from a sample image

    From the above results, it's clear that random sampling doesn't work too well. It does better for the brick texture because all patches will look similar. However, the boundaries between patches are distinct if you zoom in on the brick texture.

    Overlapping Patches

    The goal in this section is to randomly sample patches from a sample to generate an output image of some specified size. This is the same goal as the previous section; however, in this case we are going to act a bit more intelligently about how we sample patches. The process to do so is as follows:

      1. Start from upper left corner
      2. Sample new patches to overlap w/ existing ones

      Note: 3 Possible Overlap Regions
        First Row = Vertical Overlap
        First Column = Horizontal Overlap
        Everywhere Else = L-shaped Overlap (Top & Left)

      3. Compute cost of each patch
      4. Randomly sample a patch with cost <= (1 + tolerance) * min_cost
      5. Once patch has been sampled, copy pixels over to corresponding postion in output image

    From the above results, it's clear that overlapping sampling works better than random sampling. I tested two different approaches to this section:

    Explicit Mask-Based Approach:

    In this approach, I created a mask of the overlap region with 1's where the pixels overlapped and 0 everywhere else. I then applied the mask to each of the derived patches (the output patch and sample patch) and computed the SSD between them. In this approach, my ssd_patch() returned a cost matrix which was passed into choose_sample(). choose_sample() then sampled a pair of optimal indices within the threshold and returned them to my quilt_simple() function which finally selected and added the chosen sample patch to the output.

    Implicit Mask Approach:

    In this approach, I did not create an explicit mask. Instead, I filtered the two patches based on the overlap regions and computed the SSD between those patches. Thus, my ssd_patch() no longer returned a matrix of costs, instead it returned a single SSD value. As a result, my choose_sample() function iterates over all the patches and constructs the cost matrix, samples optimal indices and returns the chosen patch to quilt_simple().

    I ended up using this approach because it gave me better outputs.

    Seam Finding

    The goal in this section is to remove the edge artifacts that were present in the overlapping patches of the previous section. We select our patches in the same manner described above, but then I apply a seam-finding algorithm to eliminate the artifacts. The process is as follows:

      1. Sample patches using Overlapping Patches approach
      2. Find the min-cost path contiguous path across the patch (according to the costs calculated previously)
      3. This path is used to define a binary mask that specifies which pixels to copy from the newly sampled patch

    Seam Plots

    Here we have some plots that highlight a sample patch, output patch, cost and mask from the first iteration of the seam finding algorithm on the rice texture.

    Outputs

    Custom Inputs:

    From the above results, the seams that were previously present have either completely vanished or have gone down drastically. To complete this task, I tested two approaches again:

    Dynamic Programming Approach:

    In this approach, I used dynamic programming to find the min-cost path with a backtracking algorithm.

    Dijkstra's Approach:

    In this approach, I implemented Dijkstra's algorithm and used that to find the min-cost path.

    I ended up using this approach because it gave me better outputs.

    Bells and Whistles:

    I implemented my own cut() function (without referencing existing cut functions) for this B&W section.

    Texture Synthesis Consolidated Outputs

    Upon reviewing the outputs of the 3 algorithms: Random, Overlapping Patches & Seam Finding. We can see that there is generally an improvement when moving from the Random to the Overlapping Patches algorithm. This is most noticeable in the white and text textures. Finally, the Seam Finding algorithm smoothes out the Overlapping Patches result, thereby yielding a more "polished" final output.

    Our outputs are generally good; however, there are some questionable areas here and there. The white texture looks the most suspect, likely due to the strong variability between each sample and the poor image quality.

    Texture Transfer

    Now the objective shifts from extrapolating textures to actually transferring the texture from one object to another. The process is similar to how we removed the seams in the previous section, but there is an additional cost term, the correspondence cost. This cost measures the difference between the sampled source patch & target patch at the location to be filled. The total cost that we're minimizing is thus the weighted sum of the overlap cost (the SSD that we've been calculating in the previous two sections) and the correspondence cost.

    The correspondence cost is calculated by first converting both images to grayscale. This is done because one of the measurements of correspondence is the luminance or image intensities. We could've also used the alpha channel but since these images don't have one, we instead convert them to grayscale because in this setting, the pixel values represent the intensity at that point.

    Tuning alpha took some trial and error, I initially started out with alpha=0.1, which didn't yield fruitful results. It was difficult to see the original image in there, as I increased alpha the image became clearer. I finally settled on alpha=0.8 as the optimal alpha with patch_size=20.

    Brick Feynman

    Rice Feynman