Project 1: Image Quilting

Randomly Sampled Texture

Approach

To create the random sampled texture, we randomly sampled patches of a fixed patchsize and added these tiles one by one to create a randomly sampled texture. This method was the simplest and fastest but the results were, well, pretty random.

Given the original picture:

Here's one result of the random algorithm, scroll down to see a comparison of all three quilt techniques.

Overlapping patches

Approach

For this part, we sampled patches with some overlap between every patch. Usng the sum of squared differences, we computed the cost of the overlapping region between an existing patch in the result quilt and a potential new patch to be added to the final quilt. For every spot in the final quilt, we computed this SSD between every possible patch from the sample texture's overlap region and the existing quilt's overlap region. We would keep track of all patches that had an SSD that was under some error specified by a user defined tolerance and then randomly selected a patch from these patches to add to the final quilt.

Here's one result of the overlapping patches algorithm, scroll down to see a comparison of all three quilt techniques. Clearly, this produces a much better result than the randomly sampled patches.

Seam Finding

Approach

While overlapping patches produced decent results, there are still some distracting lines in our quilts that we want to get rid of. To do so, we add seam finding to our overlapping patches approach in order to find the best matching cut between two overlapping patches to reduce the edge artifacts.

To find the cut for each overlapping region, we must find the min-cost contiguous path from one side to the other side of the overlapping patch.

Below is an example of seam finding for a patch that has both a top and left overlap.

Existing quilt overlap region New patch to be added
Left SSD Left SSD path Left Cut Mask
Top SSD Top SSD path Top Cut Mask
Combined SSD path Combined Cut Mask
Existing quilt overlap cut result New patch to be added cut result
Final merged patch

Result

Here's one result of the seam finding algorithm, scroll down to see a comparison of all three quilt techniques. Clearly, this produces a much cleaner result than the overlapping patches.

Overall, the seam finding performs the best because it eliminates any edge artifacts that can be seen in the overlapping patches result. The overlapping patches result does do a good job of matching patches based on color. The random sampling does a poor job of creating a quilt because colors are misaligned and patterns are not connected due to patches being selected randomly.

All Quilting Results

Here's a compilation of quilting results using all three methods with given textures (first two rows) and our own personal textures from the internet.

Original Texture Random Sampling Overlapping Patches Seam Finding

Texture Transfer

Approach

Given this quilting technique, we can apply it to transfer textures to target images. Texture transfer requires that the output image must follow the correspondence mapping between the source texture and target image and also produce an output that is made up of the source texture.

To do so, we need to define a correspondence map between the source texture and target image we want to transfer the texture to. For this project, the correspondence map is the luminance of the image. Places where both the source texture and target image are bright have a low error. Using some user-specified alpha value, usually between 0.4 and 0.6 for these results, the error term for determining what patch to add to the quilt now becomes alpha * the SSD block overlap error + (1 - alpha) * the SSD error between the correspondance map of the sample texture patch and those in the corresponding location in the target image. The alpha determines how much we want to value preserving the target image appearance versus how much we want to value the seamless transition.

Below are some texture transfer results.

Original Texture Target Image Texture Transfer Result (alpha = 0.5)
Original Texture Target Image Texture Transfer Result (alpha = 0.5)
Original Texture Target Image Texture Transfer Result (alpha = 0.3)
Original Texture Target Image Texture Transfer Result (alpha = 0.5)

Bells & Whistles

Iterative Approach to Texture Transfer

The iterative approach to texture transfer involves applying texture transfer multiple times to an increasingly small patch size in order to generate better images. Given texture transfer requires two constraints to be satisfied, it is sometimes unable to produce an optimal result in the first iteration. Thus, we can apply iterative texture transfer to iterate over the resulting quilt image multiple times with a smaller patch size each time. We also add an additional constraint that the block being added to the final image must satisfy the original contraints set by texture transfer and also match the patch in the same position in the synthesized image from the previous iteration. This allows us to improve the texture transfer result by produced more fine-grained results that iteratively improve upon the texture patches selected for every region of the image. It's recommended that the image gets iterated over 3 to 5 times. Here, we did 3 iterations.

Below are some results of iterative texture transfer and are compared to results from non-iterative texture transfer.

Original Texture Target Image Texture Transfer (iter = 1) Texture Transfer (iter = 2) Texture Transfer (iter = 3) Texture Transfer (non-iterative)

Overall, I think that the results from iterative texture transfer were good and you can visibly see the improvement in every iteration. However, we also found that running regular texture transfer with small patch sizes (5-10 pixels, with 2 pixels of overlap) produces very good texture transfer results as well. The differences in results is likely due to differences in patch size (the iterative had larger patch sizes even at the 3rd iteration).

Project 2: Light Field Camera

Project Overview

For this lightfield camera project, our goal is to produce cool effects like depth refocusing and aperture adjustment using collections of large image sets taken over a plane that is orthogonally aligned to the optical axis.

Depth refocusing is the effect of focuses on various parts of the image at different depths. The basic idea is that closer objects have much more drastic position changes than objects farther away and thus averaging the images together with different amounts of shifts will cause for different parts of the image to be focused.

Aperature adjustment comes from the idea that averaging different amounts of images will emulate different aperature values. Less images will make the image appear to have a smaller aperature while larger sets of images will result in an increased aperature size for the final image. This conceptually makes sense because with a larger set of images, we are adding more images and expanding the amount of light that appear in the final image.

To see more detailed descriptions on the approach and final results, read the below sections.

Depth Refocusing

Approach

For this part, we first process the images from the datasets in Stanford's collection of lightfield camera images. Each dataset has 289 images that were taken on a 17 x 17 grid with a certain (u, v) value for every image corresponding to the ray position. We also have the (x, y) coordinate location that the image was taken in relative to our 17 x 17 grid.

Then, to create the depth refocusing effect we need to shift images by different amounts so that different parts of the image are in focus. By shifting all the images by some amount and then averaging them together, this allows us to create an image that is an anverage of the light from all the rays captured by the sub-aperature images.

Steps

To create the depth refocusing effect, we follow the following steps:

1) Shift all the given (u, v) coordinates to share the same center of (0, 0) by subtracting the mean (u, v) coordinate value from all (u, v)s of all images.

2) Shift all images by some constant K * (u, v) so that new_x = x + c * u and new_y = y + c * v. I used np.roll() to perform the shift. This constant C usually is around the range [-0.4, 0.4] and different C values result in different areas being in focus.

3) Average all the shifted images together to create the final result.

Results

Here are some gifs of results as well as some images to demonstrate how changing C will change the depth of the refocused area.

Chess (K = [-0.2, 0.3]) Legos (K = [-0.4, 0.4]) Legos Truck (K = [-0.4, 0.4])

Looking at how different constant values impact the depth of the focused area, we see that as C goes from negative to positive, the area of focus shifts from the further away parts of the image to the objects that are closer to the camera.

Chess

K = -0.1 K = 0 K = 0.2

Lego

K = -0.2 K = 0 K = 0.2

Lego Truck

K = -0.1 K = 0 K = 0.2

Aperture Adjustment

Approach

Given a fixed constant K, we can simulate an adjustment in aperature by adding or removing sub-aperature images from the final averaged outcome. Conceptually, this makes sense because adding more sub-aperature images would be adding more light that passes through a specified aperature into the final average.

Here, we decided to have the smallest aperature image to be the image taken at the (x, y) value of (8, 8), which is the center image in the 17x17 grid.

Steps, given a fixed K value (here we used 0.15 because that resulted in ~middle area of the image being focused):

1) Begin at N = 0 and call our depthrefocus() method on just the cneter image at (8,8).

2) Increment N by 1, which means that we will be taking all images from [8 - N, 8 - N] in the image grid to [8 + N, 8 + N] in the image grid. This is a total of 9 images when N = 1. Treat all these images as the image set and call depthrefocus() on this image set.

3) Repeat step 2) until we have all the images in the 17 x 17 image grid in the image set to be passed into depthrefocus().

Results

Below are results from aperture adjustment. K = 0.15 for all, and the smallest aperture image is the center image (8,8).

Chess Legos Legos Truck

Here's what the smallest aperture and largest aperture look like for these three sets of images.

Smallest aperture Largest aperture

Summary

Overall, we learned that it is possible to create complex effects from simple techniques like shifting and averaging images. We also learned that sub aperture images are created by reorganizing the rays of light so that every image contains the light corresponding to a small aperature range, and that selecting different sets of these can allow us to mimic different aperatures even after the photographs have been taken.

Project 3: Neural style transfer

Project Overview

We reproduced the algorithm and results as described in the paper "A Neural Algorithm of Artistic Style paper" by Leon A. Gatys, Alexander S. Ecker, and Matthias Bethge. This was a really interesting technique to implement because it brought in elements of art and machine learning that allowed us to more deeply understand how these models work and try out cool combinations of art and photos ourselves! It essentially allowed us to “paint” (generate images” in the style of various artists / art pieces!

Since we were using a pretrained network, vgg19, a lot of the time-intensive work (i.e. training) was luckily handled for us. By the processes described in the paper, we were able to use individual layers in the network to compute losses specific to applying a style to an image and constructing a given structure / content in the image.

To see more detailed descriptions on the approach and final results, read the below sections.

Style, content, and input images

The first step was to first load our images. We chose one image as my “style image,” which contained the artistic style I was trying to make my network emulate. We then chose a “content” image, which contained the content / structure of the underlying image. Finally, we chose our input image, which was the image we were changing the style of — in most cases, we set that image to simply be a clone of the content image in order to better show the structure of the content. When we simply used a white noise / randomly generated image as the input, it still worked well, but the structure was less defined for a smaller number of epochs. Since we are using the VGG19 network, we had to normalize (mean = [0.485, 0.456, 0.406], std = [0.229, 0.224, 0.225]) and crop all of these images before feeding them into the network. We chose a crop size of (550, 550) because we didn't see a size defined in the paper, and it seemed like a good enough resolution to apply the transfer to while also not being incredibly time intensive to process.

Image Normalized image

Approach

The bulk of the work was centered around computing the correct style and content losses. The loss computations described below were only computed at every style or content layer, respectively, which is why in our implementation we had to iterate through every layer in the network to find the correct layers at which to compute and add the various loss terms.

The style loss was computed by calculating the Gram matrix, which is essentially just the inner product between the vectorized feature map of a given layer:

Taking this gram matrix value, we were able to compute the error by finding the difference / distance between the gram matrix representation and the original image. In this equation, a is the original image and x is the generated image, and N and M are the image’s height and width, respectively. To compute the final style loss, we take this error and weight it depending on how many convolutional style layers (max = 5) are active. This variability in weight was relevant when computing the different style representations below!

The content loss was computed by finding the MSE between the original image’s and the content image’s content representations (where p is the original image, x is the generated image, and P and F are their respective feature representations).

To find the total loss over which we ultimately would backpropagate and use to update our image’s style and content, we multiplied the style and content loss values by alpha and beta terms [total_loss = alpha * content_loss + beta * style_loss]. We can see the result of different alpha/beta ratios in the style representation section below!

Results

Applying different styles to this photo of the painted ladies:

NOTE: for all of these images, the # epochs = 800, learning rate = 0.05, alpha = 1, and beta = 1e4

With a larger number of epochs, all of these images would likely have a better style transfer. However, running the model took quite a bit of time per image so we were reasonably unable to run the algorithm on each image for more than a few hundred epochs.

Starry night by Vincent Van Gogh, 1889 Styled image
Composition VII by Wassily Kandinsky, 1913 Styled image
The Shipwreck of the Minotaur by J.M.W. Turner, 1805 Styled image

The paper mentioned switching out the Max Pooling operations with Average Pooling operations for better results! We weren't convinced, so we tried both. Here is a comparison of the results of using average and max pooling on the style representation images, given a static input image. Based on the first two results, it's clear that average pooling creates a much smoother and visually appealing output!

For these images, just as defined in the paper, each row corresponds to the outputs computed based on a given layer's loss (conv1_1, conv2_1, conv3_1, conv4_1, and conv5_1 from top to bottom), and each column corresponds to its alpha/beta ratio (1e-5, 1e-4, 1e-3, 1e-2 from left to right). As you can see, as the ratio gets smaller (so the beta, style weight becomes larger), the style become more pronounced. As this ration decreases, the content becomes progressively more clear. We can also see that as the number of layers increases, we get thicker and more detailed strokes. The lower layers use small strokes, and higher layers use thick strokes primarily, which combines to create a cohesive scene. This type of representation allows us to visualize not only the contribution of each style layer to the final "painting" (represented primarily by the size and shape of brush strokes, in this example) but also allows us to see how those styles are applied to the image, and combined to find the final ourput (bottom right image). With a greater number of epochs, we can also see that the content image / structure of the image is much more pronounced.

Style Image Content image Input image -- random static!
Average pooling, 400 epochs
Max pooling, 400 epochs
Max pooling, 1000 epochs

Bonus images with the style transfer applied!

Starry night Chipotle Starry Chipotle

For this output, we only ran it for 100 epochs and the stlye was so pronounced -- we were shocked!

Cubist portrait Anjali Cubist Anjali