Final Project

David Yi

For the final project, I work on the projects Reimplementing A Neural Algorithm of Artistic Style and Lightfield Camera

Reimplementing "A Neural Algorithm of Artistic Style"

In this project, I attempt to reimplement A Neural Algorithm of Artistic Style by Gayts et. al. The goal of the paper is to combine the "stylistic" aspects of one image and the "content" aspect of another image to compose a new image.

In [9]:
from __future__ import print_function

import matplotlib.pyplot as plt
import numpy as np
from PIL import Image
from skimage.transform import rescale, resize, downscale_local_mean

import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim

import torchvision.transforms as transforms
import torchvision.models as models

# Ignore warnings
import warnings
warnings.filterwarnings("ignore")

import copy

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

# Some of the code from this notebook is adapated from the following tutorial: 
# https://pytorch.org/tutorials/advanced/neural_style_tutorial.html

The VGG-19 Convolutional Neural Network

The paper uses the pre-trained VGG-19 convoluational neural network with slight alterations. Specifically, gradients are only taken at the convolutional layers, while the linear output layers are ignored. Additionally, the max-pooling layers in the original VGG-19 network are replaced by average-pooling layers. Although I didn't find a big difference between the two types of pooling layers, average-pooling gives slightly better results.

Hyperparameters:

  • $\alpha/\beta$ (content-style ratio): $1x10^{-6}$
  • Training Steps: 300

Notably, the alpha-beta ratio is significantly smaller than the ratio in the paper, meaning that I give more weight to style.

The architecture of the original VGG-19 CNN is shown below:

Defining the Loss Functions

Content Loss

Although the paper uses a randomly generated white noise image as the initial image, using the content image yields significantly better results and converges quicker. The content loss for the original and generated images, $\vec{p}$ and $\vec{x}$ at a specified layer $l$ is given by the equation below, where $F^l$ and $P^l$ correspond to the responses at the l'th convolutional layers. We specifically evaluate the content loss at the fourth layer, as done in the paper.

$L_{content}(\vec{p},\vec{x}, l) = \frac{1}{2}\sum_{i,j}(F_{ij}^l - P_{ij}^l)^2$

In [50]:
class ContentLoss(nn.Module):
    
    def __init__(self, target):
        super(ContentLoss, self).__init__()
        # detach to avoid autograd
        self.target = target.detach()
    
    def forward(self, input):
        # simultaneously computes layer's output while storing gradient
        # input and loss are correspond to the feature maps at the l'th layer
        self.loss = F.mse_loss(input, self.target)
        return input

Style Loss

The "style loss" is slightly more complicated as we have to first compute the "Gram" matrix of the convolutional layers for the original and generated images. Effectively, the Gram matrix can be computed as $F^lF^{l^T}$ where $F_l$ corresponds to the convolutional responses of the l'th layer. Finally, we take a weighted sum of the squared deviations between the original style image and the generated image. The equations for to compute the gram matrix, the loss at each layer, and the total style loss is shown below. The style loss is evaluated on the first five convolutional layers.

$G_{ij}^l = \sum_k F_{ik}^lF_{jk}^l$

$E_l = \frac{1}{4N_l^2M_l^2}\sum_{i,j}(G_{ij}^l-A_{ij}^l)^2$

$L_{style}(\vec{a}, \vec{x}) = \sum_{i=0}^L w_lE_l$

In [51]:
def g_matrix(inp):
    x1, x2, x3, x4 = inp.size()
    features = inp.view(x1*x2, x3*x4)
    gram = features @ features.T
    return gram / (x1*x2*x3*x4)
In [52]:
class StyleLoss(nn.Module):
    
    def __init__(self, target):
        super(StyleLoss, self).__init__()
        self.target = g_matrix(target).detach()
    
    def forward(self, input):
        gram = g_matrix(input)
        self.loss = F.mse_loss(gram, self.target)
        return input
In [53]:
# Normalization values are found here: 
# https://github.com/pytorch/examples/blob/97304e232807082c2e7b54c597615dc0ad8f6173/imagenet/main.py#L197-L198
cnn_norm_mean = torch.tensor([0.485, 0.456, 0.406]).to(device)
cnn_norm_std = torch.tensor([0.229, 0.224, 0.225]).to(device)

class Normalize(nn.Module):
    
    def __init__(self, mean, std):
        super(Normalize, self).__init__()
        self.mean = torch.tensor(mean).view(-1, 1, 1)
        self.std = torch.tensor(std).view(-1, 1, 1)
    
    def forward(self, img):
        return (img - self.mean) / self.std
In [31]:
def get_losses(cnn, norm_mean, norm_std, style_img, content_img):
    
    # Add normalization layer according to standard normalization params for pretrained model
    cnn = copy.deepcopy(cnn)
    norm = Normalize(norm_mean, norm_std).to(device)
    content_losses = []
    style_losses = []
    
    model = nn.Sequential(norm)
    
    # Build custom model from VGG19 model iteratively 
    i = 0
    for layer in cnn.children():
        if isinstance(layer, nn.Conv2d):
            i += 1
            name = f'conv_{i}'
        elif isinstance(layer, nn.ReLU):
            # Results in error if we set inplace=True
            name = f'relu_{i}'
            layer = nn.ReLU(inplace=False)
        elif isinstance(layer, nn.MaxPool2d):
            # Replace maxpool layers with Averagepool layers
            name = f'pool_{i}'
            layer = nn.AvgPool2d(kernel_size=2, stride=2, padding=0)
        elif isinstance(layer, nn.BatchNorm2d):
            name = f'bn_{i}'
        
        model.add_module(name, layer)
        
        # Add Content/Style Loss layers that do not track gradients
        # Content Layers
        if name in ['conv_4']:
            target = model(content_img).detach()
            content_loss = ContentLoss(target)
            model.add_module(f'content_loss_{i}', content_loss)
            content_losses.append(content_loss)
        
        # Style Layers
        if name in ['conv_1', 'conv_2', 'conv_3', 'conv_4', 'conv_5']:
            target_features = model(style_img).detach()
            style_loss = StyleLoss(target_features)
            model.add_module(f'style_loss{i}', style_loss)
            style_losses.append(style_loss)
        
    return model, style_losses, content_losses

Style Transfer

Finally, we execute the style transfer by following the procedure below:

  1. Resize style image to match the content image
  2. Train VGG-19 model for roughly 300 iterations, gradient is computed as a weighted sum of the content and style losses
  3. Update input image in the direction of the gradient

$L_{total}(\vec{p}, \vec{a}, \vec{x}) = \alpha L_{content}(\vec{p}, \vec{x}) + \beta L_{style}(\vec{a}, \vec{x})$

As done in the paper, we will use the Neckarfront painting as the content image and The Shipwreck, The Scream, and Starry Night as the style images. My results are shown on the left alongside the paper's results on the right.

In [54]:
def run_style_transfer(cnn, norm_mean, norm_std, content_img, style_img, input_img, num_steps=300,
                      style_weight=1_000_000, content_weight=1):
    
    # We use LBFGS for the optimizer, as suggested here: 
    # https://discuss.pytorch.org/t/pytorch-tutorial-for-neural-transfert-of-artistic-style/336/20?u=alexis-jacq
    
    # Optimize the input image to minimize total loss
    model, style_losses, content_losses = get_losses(cnn, norm_mean, norm_std, style_img, content_img)
    optimizer = optim.LBFGS([input_img.requires_grad_()])
    run = [0]
    while run[0] < num_steps:
        
        # need to define closure argument to evaluate training loss/gradient for LBFGS optimizer
        # source: https://pytorch.org/tutorials/advanced/neural_style_tutorial.html
        def closure():
            # Generate image and store losses at each layer
            input_img.data.clip_(0, 1)
            optimizer.zero_grad()
            model(input_img)
            
            # Compute losses
            style_loss = sum([s.loss for s in style_losses])
            style_loss *= style_weight
            content_loss = content_losses[0].loss
            content_loss *= content_weight
            loss = style_loss + content_loss
            loss.backward()

            run[0] += 1
            if run[0] % 100 == 0:
                print(f'run {run}')
                print(f'Style Loss: {style_loss.item()}, Content Loss: {content_loss.item()}')
            return style_loss + content_loss
        
        optimizer.step(closure)
    
    input_img.data.clip_(0, 1)
    return input_img
In [55]:
def image_loader(image_name):
    shape = plt.imread(image_name).shape[:2]
    loader = transforms.Compose([
        transforms.Resize(shape),  
        transforms.ToTensor()]
    )  
    image = Image.open(image_name)
    image = loader(image).unsqueeze(0)
    return image.to(device, torch.float)

def to_image(out):
    img = output.clone().cpu().detach().squeeze(0)
    img = np.moveaxis(img.numpy(), 0, 2)
    return img
In [56]:
net = models.vgg19(pretrained=True).features.to(device).eval()
In [77]:
style_img = image_loader("images/shipwreck_small.jpg")
content_img = image_loader("images/neckarfront.jpg")
input_img = content_img.clone()
output = run_style_transfer(net, cnn_norm_mean, cnn_norm_std, 
                            content_img, style_img, input_img,
                            style_weight=1000000, content_weight=1, num_steps=300)

Neckarfront + Shipwreck

In [58]:
fig, ax = plt.subplots(1, 2, figsize=(20, 15))

plt.subplot(1, 2, 1)
plt.imshow(plt.imread('images/shipwreck_small.jpg'))
plt.axis('off');

plt.subplot(1, 2, 2)
plt.imshow(plt.imread('images/neckarfront.jpg'))
plt.axis('off');