Image Quilting & Artistic Neural Style Transfer

CS 194-26: Image Manipulation and Computational Photography — Final Projects

Monica Tang


Table of Contents:

Image Quilting and Texture Transfer
Artistic Neural Style Transfer



Image Quilting and Texture Transfer

Using the method described in a SIGGRAPH paper by Efros and Freeman, we can synthesize larger texture images using a small texture sample. Below is an example of what we will achieve. We are also able to perform texture transfer, which involves giving an object the appearance of a texture while maintaining its shape.

Image Quilting

The main idea of the image quilting method is to sample blocks (or patches) of the texture sample and find other blocks in the sample that match in an overlapping region.

Figure from SIGGRAPH paper

As shown in the figure above, there are three approaches to image quilting that we will explore. The first, a naive approach, simply places random patches next to each other. This method does not produce satisfactory results, as can be seen below. A better method involves randomly selecting only one patch and using this as the upper leftmost patch. Then we select subsequent blocks based on the SSD error of their overlapping regions with the previous blocks. In particular, we randomly choose a patch whose SSD cost is below a specified tolerance. We achieve much better results with this approach. However, the harsh edge artifacts of these square blocks is noticeable in some cases. The third approach eliminates these by cutting the blocks along a minimum error seam, so instead of placing square blocks into our image quilt, we will place the seam-cut blocks.

Below are the results from each approach for the brick and knit textures. The improvement for each method is most noticeable in the knit texture results.

The following are more examples of texture synthesis.

Texture Transfer

Now that we can synthesize textures from a sample, we can even transfer textures onto other images. In order to preserve the shape of the object we want to transfer texture to (the target image), we not only need to satisfy texture synthesis requirements but also a correspondences map between the texture and target image blocks. More specifically, we not only need to compute the SSD of a block with the overlapping regions of its neighboring blocks, but also of the block with a block from the target image.

Below is a sketch texture that we would like to transfer to an image of Feynman.

In my implementation, I used blurred image intensities as the correspondence map.

When determining the total SSD cost of a particular block, we take the weighted sum, alpha times the block overlap error plus (1 - alpha) times the error of the block with its corresponding target image block. The parameter alpha determines the tradeoff between texture synthesis and how closely the result matches the target image.

Performing one texture synthesis pass produces a non-iterative approach. But we can also make multiple passes, reducing the block size by one third and setting alpha to 0.8 * (iteration - 1) / (total iterations - 1) + 0.1, for each iteration. In this method, we also need to match blocks with those in the previous iterations. Below shows a comparison between the results of the non-iterative and iterative methods. I used 3 passes in the iterative method.

In the iterative result, some of the facial features are slightly more defined (produced in later iterations), and the larger characteristics of the texture are also retained (from earlier iterations).

Another example using the non-iterative method.

Texture transfer offers a way to render realistic photographs in non-photorealistic ways. The next section will describe another way to do so, but with deep neural networks.



Artistic Neural Style Transfer

Using neural networks, we can transfer artistic styles to realistic photographs! The research paper by Gatys, Ecker, and Bethge describes how this is done. The following is based on their method.

Houses in the style of van Gogh's The Starry Night:

First, we load in our images. For each style transfer, a content image and a style image of the same size is needed. We also need an input image that will be modified to match the content and style features. I used a copy of the content image as the input. These images are then preprocessed: they are resized so that the shorter side is 256 (to reduce transfer time) and are normalized to fit VGG input requirements. As described in the paper, I used a 19 layer VGG network in which I replace the max pooling layers with average pooling.

The key to style transfer is in the content and style representations. Representing the content and style at particular points in the CNN will allow us to match an input image to the content and style images. Since the reconstruction of the input image from feature maps at higher layers of the network capture high-level content (i.e. objects and their arrangements), we will match feature responses at layer 'conv4_2' of the VGG network to produce the content reconstruction. For the style reconstruction of an input image, we compute the correlation between different features at multiple layers. I chose to match the style representations on the layers 'conv1_1', 'conv2_1', 'conv3_1', 'conv4_1', and 'conv5_1'.

The loss of a content and input image at a particular layer is defined as the mean-squared error between their feature representations at the specified layer. The loss of a style and input image at a particular layer is defined as the mean-squared error between the Gram matrices of their feature representations at the specified layer. The total loss is then the weighted sum of the content loss and the style loss: alpha * (content loss) + beta * (style loss) where alpha and beta are the weighting factors. For the style transfer result shown above, I used an alpha/beta ratio of 1e-3.

In my implementation, I used the Adam optimizer. The style transfer of the houses and The Starry Night used a learning rate of 0.01. I also ran the model for 800 epochs.

More Examples

For all of the following examples, the parameters I used were alpha/beta = 1e-4, learning rate = 0.01, and number of epochs = 1000.

Black cat in the style of Matisse's Goldfish:

Scenic Norwegian road in the style of Studio Ghibli's My Neighbor Totoro:

Small Norwegian street in the style of Seurat's A Sunday Afternoon on the Island of La Grande Jatte: