Part 1: Image Quilting


My first project was implementing image quilting. I started by implementing random, overlapping, and seam-finding methods for creating a particularly sized image by repeating a texture. Below is one texture and the results of these 3 methods respectively.


(Original texture, random sampling, overlap sampling, and seam finding in order)

Here is an illustration of the seams, which follow the path with the lowest sum of squared difference.



Finally, here are 3 more image synthesis results on different textures.





Here are results for texture transfer, which essentially works the same as seam-finding quilting (adding patch by patch iteratively and computing SSD to find visually non-disruptive seams), but with the added step of finding the sample patch with the lowest SSD from each target image patch as we iterate over the target image.





For bells and whistles, I implemented a custom cut function that uses dynamic programming to decide which direction to advance the seam in at each recursive step. Designing this function and figuring out border alignment (off-by-one errors were common) were the most challenging and interesting parts of this project for me.

Part 2: Neural Style Transfer


My second project was a reimplementation of the Neural Style Transfer paper. Here, I modified a VGG19 by inserting a style loss module at the fourth convolutional layer and a content loss module at convolutional layers 1 through 5. I downsampled the style and content images to 128x128 for faster computation, and ran 500 iterations of optimization. Rathater than a 10^4 ratio between style and content weight as recommended in the paper, I found that a 10^6 or 10^7 ratio produced the best results for me.

The original VGG19 architecture:

VGG(
(features): Sequential(
(0): Conv2d(3, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(1): ReLU(inplace=True)
(2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(3): ReLU(inplace=True)
(4): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
(5): Conv2d(64, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(6): ReLU(inplace=True)
(7): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(8): ReLU(inplace=True)
(9): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
(10): Conv2d(128, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(11): ReLU(inplace=True)
(12): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(13): ReLU(inplace=True)
(14): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(15): ReLU(inplace=True)
(16): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(17): ReLU(inplace=True)
(18): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
(19): Conv2d(256, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(20): ReLU(inplace=True)
(21): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(22): ReLU(inplace=True)
(23): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(24): ReLU(inplace=True)
(25): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(26): ReLU(inplace=True)
(27): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
(28): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(29): ReLU(inplace=True)
(30): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(31): ReLU(inplace=True)
(32): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(33): ReLU(inplace=True)
(34): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(35): ReLU(inplace=True)
(36): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
)
(avgpool): AdaptiveAvgPool2d(output_size=(7, 7))
(classifier): Sequential(
(0): Linear(in_features=25088, out_features=4096, bias=True)
(1): ReLU(inplace=True)
(2): Dropout(p=0.5, inplace=False)
(3): Linear(in_features=4096, out_features=4096, bias=True)
(4): ReLU(inplace=True)
(5): Dropout(p=0.5, inplace=False)
(6): Linear(in_features=4096, out_features=1000, bias=True)
)
)

My architecture:

Sequential(
(conv_0): Conv2d(3, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(relu_1): ReLU()
(conv_1): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(relu_2): ReLU()
(pool_2): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
(style_loss_2): StyleLoss()
(conv_2): Conv2d(64, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(relu_3): ReLU()
(style_loss_3): StyleLoss()
(conv_3): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(relu_4): ReLU()
(pool_4): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
(style_loss_4): StyleLoss()
(content_loss_4): ContentLoss()
(conv_4): Conv2d(128, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(relu_5): ReLU()
(style_loss_5): StyleLoss()
)

Here are some results on Neckarfront reproduced from the paper.





Below are a set of results on my own images. The final one is a mild failure case where the output blends lines together and retains some color, failing to recognize that the style photo is black and white. A higher style to content weight ratio or more training may have prevented this.