Data Preprocessing

In this part, I did some simple image preprocessing to read in the images and transform them before feeding them into the model. Namely, the preprocessing consisted of:

Resizing the artwork such that it had the same size as the photograph

Converting both images to tensors

Adding a "dummy" dimension to the tensors because VGG requires 4 input dimensions (batch_size, input_channels, h, w)

Above, you see the original image that we will be applying various art styles to. Those styles are highlighted in the artworks below:

The Shipwreck of the Minotaur

J.M.W. Turner

Starry Night

Vincent Van Gogh

Der Schrei (The Scream)

Edvard Munch

Femme Nue Assise

Pablo Picasso

Composition VII

Wassily Kandinsky

CNN Model Architecture

Instead of creating a CNN from scratch, I made use of a pre-trained VGG-19 model (as per the Gatys, Ecker & Bethge paper - A Neural Algorithm of Artistic Style. Following that paper, I replaced all the MaxPooling with AvgPooling to improve the gradient flow and better the final quality of the image. In addition, I removed all the fully-connected layers such that there were only convolutional and pooling layers.

Now, for the unique insights from the paper. The paper takes advantage of two key pieces of information that are derived from different stages of the CNN. Namely, content & style reconstructions.

Content Loss

The content reconstruction attempts to reconstruct the input image from the convolutional layers. The reconstruction from the lower levels is almost perfect, while more detailed (pixel-specific) information is lost in the higher levels. Each layer has N distinct filters with N feauture maps, each of size M (where M = h*w of the feature map). Thus, we can define a matrix F such that F[i,j] represents the activation of the ith filter at position j in a specific layer. We can construct this feature representation for both the original photo (P) and the generated photo (F). As part of this construction, we attempt to minimize the squared error loss between the F and P matrices. This leads to a new component in the neural network that needs to be included after relevant convolutional layers - the Content Loss.

The paper only highlights the fourth convolutional layer as the layer at which we compute the content loss. However, I found that applying the content loss at convolutional layers 3, 4, and 5 yielded the best results.

Style Loss

The style reconstruction attempts to build a new feature space that captures the image style on top of the CNN representation. The style representation computes correlations between different features at different convolutional layers of the CNN. The feature correlations are given by a Gram matrix G of size NxN. G can be calculated by taking the inner product of the vectorized feature maps.

Now, to generate the texture that matches the style of the image I used Gradient Descent on the original image. This is done by minimizing the mean squared distance between entries of the Gram matrix from the original style representation A and the Gram matrix of the generated style representation G - the Style Loss.

The paper highlighted all the subsets of the 5 convolutional layers as layers on which to compute the style loss. By trial-and-error, I found that applying the style loss after all the convolutional layers yielded the best results.

Personal Touches

In summary, the modification made to the basic VGG-19 model are the addition of the Content Loss and Style Loss layers after the specified convolutional layers and the replacement of the MaxPool pooling layers with AvgPool pooling. As I was researching the VGG-19 model to better understand how it worked, I realized that the pre-trained VGG-19 model is trained on images where it normalizes images with the following means and standard deviations: mean=[0.485, 0.456, 0.406] & std_dev=[0.229, 0.224, 0.225]. In an attempt to standardize this normalization procedure, I added an additional layer to the network that normalizes the new input images in this same manner. Finally, as I was making these changes to the network, I realized that after I'd added all the Content and Style Loss layers, I didn't need anymore convolutional or pooling layers. So, to reduce the complexity and size of the model, I removed all the subsequent layers to yield the final model architecture shown below:

Alpha/Beta Ratio: 1e-12

Custom Model Inputs & Outputs

I generated outputs based on a custom input image as well, and also tested transferring the style from new pieces of arts. I used an alpha/beta ratio of 1e-10 because that seemed to be the best middle ground between maintaining the photograph's content and the artwork's style in the generated outputs above.

Hanakapiai Beach in Kauai, Hawaii x Water Lilies (Monet)

Hanakapiai Beach in Kauai, Hawaii x Persistence of Memory (Dali)

Hanakapiai Beach in Kauai, Hawaii x Guernica (Picasso)

CS 194-26 Final Project: Image Quilting

Ronak Laddha

Background

This project consisted of two pieces: texture synthesis and texture transfer. Texture synthesis entails taking a small texture and repeatedly sampling it to quilt together and synthesize a larger texture. Texture transfer involves rendering an object with the texture taken from a different image. The procedure for this is described in the Efros & Freeman paper - Image Quilting for Texture Synthesis and Transfer.

Randomly Sampled Texture

The goal in this section is to randomly sample patches from a sample to generate an output image of some specified size. The process to do so is as follows:

1. Start from upper left corner

- If patches don't fit evenly into output image, leave black borders @ edges

3. Save result from a sample image

Original Texture

Sampled Output Texture

Original Texture

Sampled Output Texture

Original Texture

Sampled Output Texture

From the above results, it's clear that random sampling doesn't work too well. It does better for the brick texture because all patches will look similar. However, the boundaries between patches are distinct if you zoom in on the brick texture.

Overlapping Patches

The goal in this section is to randomly sample patches from a sample to generate an output image of some specified size. This is the same goal as the previous section; however, in this case we are going to act a bit more intelligently about how we sample patches. The process to do so is as follows:

1. Start from upper left corner

2. Sample new patches to overlap w/ existing ones

First Row = Vertical Overlap

First Column = Horizontal Overlap

Everywhere Else = L-shaped Overlap (Top & Left)

3. Compute cost of each patch

4. Randomly sample a patch with cost <= (1 + tolerance) * min_cost

5. Once patch has been sampled, copy pixels over to corresponding postion in output image

Original Texture

Sampled Output Texture

Original Texture

Sampled Output Texture

Original Texture

Sampled Output Texture

From the above results, it's clear that overlapping sampling works better than random sampling. I tested two different approaches to this section:

Explicit Mask-Based Approach:

In this approach, I created a mask of the overlap region with 1's where the pixels overlapped and 0 everywhere else. I then applied the mask to each of the derived patches (the output patch and sample patch) and computed the SSD between them. In this approach, my ssd_patch() returned a cost matrix which was passed into choose_sample(). choose_sample() then sampled a pair of optimal indices within the threshold and returned them to my quilt_simple() function which finally selected and added the chosen sample patch to the output.

Implicit Mask Approach:

In this approach, I did not create an explicit mask. Instead, I filtered the two patches based on the overlap regions and computed the SSD between those patches. Thus, my ssd_patch() no longer returned a matrix of costs, instead it returned a single SSD value. As a result, my choose_sample() function iterates over all the patches and constructs the cost matrix, samples optimal indices and returns the chosen patch to quilt_simple().

I ended up using this approach because it gave me better outputs.

Seam Finding

The goal in this section is to remove the edge artifacts that were present in the overlapping patches of the previous section. We select our patches in the same manner described above, but then I apply a seam-finding algorithm to eliminate the artifacts. The process is as follows:

1. Sample patches using Overlapping Patches approach

2. Find the min-cost path contiguous path across the patch (according to the costs calculated previously)

3. This path is used to define a binary mask that specifies which pixels to copy from the newly sampled patch

Seam Plots

Here we have some plots that highlight a sample patch, output patch, cost and mask from the first iteration of the seam finding algorithm on the rice texture.

Outputs

Original Texture

500px Sampled Output Texture

1000px Sampled Output Texture

Original Texture

Sampled Output Texture

Original Texture

Sampled Output Texture

Custom Inputs:

Original Texture

Sampled Output Texture

Original Texture

500px Sampled Output Texture

2000px Sampled Output Texture

Original Texture

500px Sampled Output Texture

2000px Sampled Output Texture

From the above results, the seams that were previously present have either completely vanished or have gone down drastically. To complete this task, I tested two approaches again:

Dynamic Programming Approach:

In this approach, I used dynamic programming to find the min-cost path with a backtracking algorithm.

Dijkstra's Approach:

In this approach, I implemented Dijkstra's algorithm and used that to find the min-cost path.

I ended up using this approach because it gave me better outputs.

Bells and Whistles:

I implemented my own cut() function (without referencing existing cut functions) for this B&W section.

Texture Synthesis Consolidated Outputs

Upon reviewing the outputs of the 3 algorithms: Random, Overlapping Patches & Seam Finding. We can see that there is generally an improvement when moving from the Random to the Overlapping Patches algorithm. This is most noticeable in the white and text textures. Finally, the Seam Finding algorithm smoothes out the Overlapping Patches result, thereby yielding a more "polished" final output.

Our outputs are generally good; however, there are some questionable areas here and there. The white texture looks the most suspect, likely due to the strong variability between each sample and the poor image quality.

Texture Transfer

Now the objective shifts from extrapolating textures to actually transferring the texture from one object to another. The process is similar to how we removed the seams in the previous section, but there is an additional cost term, the correspondence cost. This cost measures the difference between the sampled source patch & target patch at the location to be filled. The total cost that we're minimizing is thus the weighted sum of the overlap cost (the SSD that we've been calculating in the previous two sections) and the correspondence cost.

The correspondence cost is calculated by first converting both images to grayscale. This is done because one of the measurements of correspondence is the luminance or image intensities. We could've also used the alpha channel but since these images don't have one, we instead convert them to grayscale because in this setting, the pixel values represent the intensity at that point.

Tuning alpha took some trial and error, I initially started out with alpha=0.1, which didn't yield fruitful results. It was difficult to see the original image in there, as I increased alpha the image became clearer. I finally settled on alpha=0.8 as the optimal alpha with patch_size=20.

Brick Feynman

Original Image

Original Texture

Texturized Image

Rice Feynman

Original Image

Original Texture

Texturized Image

Original Image

Original Texture

Texturized Image

CS 194-26 Final Project: Neural Algorithm of Aristic Style

Ronak Laddha

Data Preprocessing

CNN Model Architecture

Content Loss

Style Loss

Personal Touches

Model Outputs

Neckarfront in Tubingen, Germany x The Shipwreck of the Minotaur

Neckarfront in Tubingen, Germany x Starry Night

Neckarfront in Tubingen, Germany x The Scream

Neckarfront in Tubingen, Germany x Femme nue assise

Neckarfront in Tubingen, Germany x Composition VII

Custom Model Inputs & Outputs

Hanakapiai Beach in Kauai, Hawaii x Water Lilies (Monet)

Hanakapiai Beach in Kauai, Hawaii x Persistence of Memory (Dali)

Hanakapiai Beach in Kauai, Hawaii x Guernica (Picasso)

CS 194-26 Final Project: Image Quilting

Ronak Laddha

Background

Randomly Sampled Texture

Overlapping Patches

Explicit Mask-Based Approach:

Implicit Mask Approach:

Seam Finding

Seam Plots

Outputs

Custom Inputs:

Dynamic Programming Approach:

Dijkstra's Approach:

Bells and Whistles:

Texture Synthesis Consolidated Outputs

Texture Transfer

Brick Feynman

Rice Feynman