CS 194-26 Final - Neural Algorithm of Artistic Style - Andrew Loeza

Overview:

In this project, we utilized a pretrained VGG-19 neural network in order to manipulate (optimize) a white noise image based on several feature vectors of a style image in the form of a gram matrix and one feature vector of a content image. The ultimate desired result of this was to preserve the basic content of the image while also transforming it into a different style. The overall effect of how much the content vs the style is preferred is controlled by two weight hyperparameters for the image style and content.

Network Architecture:

Below is the feature architecture that was utilized to obtain the below results. The indicated layers with stars (*) are the layers that were used to create the gram matrix used for the stylized loss. Also, each of these 5 layers was a potential candidate to be the feature vector that would be used for content image's content loss. Similar to the paper, I replaced the max-pool layers with average pool layers which seemed to give better results. Oddly enough, the hyperparameters of the style and content loss had to be reversed compared to what the paper was suggesting, giving most of the weight to the content image rather than the stylized image. I'm unsure exactly why this is had to be done to get acceptable results, but my best guess is that it's a result of PyTorch's neural network implementation causing some unknown side effect.

VGG 19 Network:

(0): Conv2d(3, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) *   
(1): ReLU(inplace=True)  
(2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))  
(3): ReLU(inplace=True)  
(4): AvgPool2d(kernel_size=2, stride=2, padding=0)  
(5): Conv2d(64, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) *   
(6): ReLU(inplace=True)  
(7): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))  
(8): ReLU(inplace=True)  
(9): AvgPool2d(kernel_size=2, stride=2, padding=0)  
(10): Conv2d(128, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) *   
(11): ReLU(inplace=True)  
(12): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))  
(13): ReLU(inplace=True)  
(14): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))  
(15): ReLU(inplace=True)  
(16): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))  
(17): ReLU(inplace=True)  
(18): AvgPool2d(kernel_size=2, stride=2, padding=0)  
(19): Conv2d(256, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) *   
(20): ReLU(inplace=True)  
(21): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))  
(22): ReLU(inplace=True)  
(23): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))  
(24): ReLU(inplace=True)  
(25): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))  
(26): ReLU(inplace=True)  
(27): AvgPool2d(kernel_size=2, stride=2, padding=0)  
(28): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) *   
(29): ReLU(inplace=True)  
(30): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))  
(31): ReLU(inplace=True)  
(32): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))  
(33): ReLU(inplace=True)  
(34): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))  
(35): ReLU(inplace=True)  
(36): AvgPool2d(kernel_size=2, stride=2, padding=0)  

Results:

Below are the results that I obtained on four of the content and style image combinatiosn that were presented in the paper along with 3 other pairs of images of my own choosing. The specific hyperparameters used to get the results are shown with the content, style, and output images along with the corresponding image from the paper. My one bad result is Forest Path. Regardless of how I tuned the hyperparameters, I couldn't get the style to transfer in a way that didn't completely overwrite the content of the image. My best guess for why this happens is because the content image lacks objects that have substantial enough structure that the filters of the neural network can identify and preserve. This is why geometric objects like buildings seem to be fairly well preserved in these transformations, whereas organic objects tend to replaced by stylized content.

Composition VII:

Hyperparameters:

Epochs = 800  
Learning Rate = 0.9  
Content Layer = 4
Content Weight = 3000000000
Style Weight = 2.2