Final Project 1: Reimplementing A Neural Algorithm of Artistic Style

Harish Palani (CS 194-26)

A Neural Algorithm of Artistic Style by Gatys et al. introduced the neural style transfer algorithm, leveraging a CNN-based architecture to recreate a base scene with artistic elements derived from some set style image. In this assignment, I reproduced this work, first testing my implementation on custom inputs before benchmarking it against the original results presented in the paper by Gatys.

I used the following two images as custom inputs while developing the algorithm, using a view of Berkeley from the Memorial Glade as the base content image and Van Gogh's famous Starry Night as the style image.

<matplotlib.image.AxesImage at 0x7fc368a25f28>

This content image was used as a starting point for my model as it began learning, yielding more consistent results in my testing as opposed to the white noise inputs used in the original paper.

Text(0.5, 1.0, 'Input Image')

The full model architecture used is shown below. Taking guidance from the original paper, I truncated a pre-trained VGG-19 network to yield a more lightweight model suited for the task at hand. I also added a normalization layer inspired by other recent works, using the mean and standard deviation of the ImageNet dataset which all VGG models are trained on — [0.485, 0.456, 0.406] and [0.229, 0.224, 0.225], respectively — to normalize the inputs per standard practice.

Sequential(
  (0): NormalizeVGG()
  (conv_1): Conv2d(3, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  (style_loss_1): MSELoss()
  (relu_1): ReLU()
  (conv_2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  (style_loss_2): MSELoss()
  (relu_2): ReLU()
  (pool_2): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
  (conv_3): Conv2d(64, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  (style_loss_3): MSELoss()
  (relu_3): ReLU()
  (conv_4): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  (content_loss_4): MSELoss()
  (style_loss_4): MSELoss()
  (relu_4): ReLU()
  (pool_4): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
  (conv_5): Conv2d(128, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  (style_loss_5): MSELoss()
  (relu_5): ReLU()
)

This yielded the following stylized output for the inputs defined above — pretty strong performance in my own (slightly biased) opinion!

Text(0.5, 1.0, 'Output Image')

To validate this implementation, I benchmarked my model against the stylization results from the original paper by Gatys — shown below for the classic Neckarfront image, with three different styles.

First up was Starry Night once again. While the output isn't necessarily a failure case, it certainly isn't quite as intense and captivating as that shared in the original paper, with many of the stylistic effects being much more subtle. I believe this can be attributed to my decision to use the content image itself as a starting input, while the original paper used white noise. The latter approach is likely far more flexible, unconstrained by the structural rigidity of the initial content image but requiring far more iterations to yield a reasonably realistic output image.

<matplotlib.image.AxesImage at 0x7fc3683b74a8>

With Munch's The Scream set as our style image, we see a far more ambitious output, with the model actively promoting several aspects of the original painting as prominent elements in its stylized re-creation of the scene. One might say it even goes a bit too far, and it very well may benefit from reduced weight on the style image in the loss function. For reference, I kept the relative weights constant throughout all experiments, with the style weight 1e10 times greater than the content weight. This is one set of hyperparameters which may benefit from further tuning on a case-by-case basis, however.

<matplotlib.image.AxesImage at 0x7fc36824ec50>

This final output with Turner's The Shipwreck set as the style image was perhaps the closest of the three to the original examples shared in the paper. Much like the previous outputs, the quality here would likely benefit from additional fine tuning of the weights and other related hyperparameters as well as adjustments to the input image or number of iterations in training, but it remains an unmistakeably stunning result considering it's fully computer generated!

<matplotlib.image.AxesImage at 0x7fc368171940>