Final Project

Pre-canned Project 1: A Neural Algorithm of Artistic Style Reimplemented

(This project is similar to my style transfer project in CS182, so I reuse some of the code from that assignment.)

In this project, I will reimplement the style transfer technique from "A Neural Algorithm of Artistic Style" (Gatys et al., CVPR 2015), which can generate an image that reflects the content of one image and the style of another. The algorithm incorporates two components into the loss function: content loss and style loss. The loss is used to perform gradient descent on the pixel values of the original images, but not on the parameters of the pre-trained SqueezeNet model, which is used as a feature extractor.

Content Representation

The pretrained CNN develops a representation of the images, which has increasingly explicit object information along the processing hierarchy. Higher layers in the networks capture the high-level content information of the images and are relatively insensitive to the actual appearance of the images. The content loss measures the differences between the feature maps of the source images and the generated images.

Let $F^\ell \in \mathbb{R}^{N_\ell \times M_\ell}$ be the feature map for the current image and $P^\ell \in \mathbb{R}^{N_\ell \times M_\ell}$ be the feature map for the content source image where $M_\ell=H_\ell\times W_\ell$ is the number of elements in each feature map.

Then the content loss is given by: $L_c = w_c \times \sum_{i,j} (F_{ij}^{\ell} - P_{ij}^{\ell})^2$

Style Representation

For the representation of the style of an image, the algorithm uses a feature space, which consists of the correlations between the different filter responses in any layer of the network. The Gram matrix G is used to represent the correlations.

Given a feature map $F^\ell$ of shape $(1, C_\ell, M_\ell)$, the Gram matrix has shape $(1, C_\ell, C_\ell)$ and its elements are given by:

$$G_{ij}^\ell = \sum_k F^{\ell}_{ik} F^{\ell}_{jk}$$

Assuming $G^\ell$ is the Gram matrix from the feature map of the current image and $A^\ell$ is the Gram Matrix from the feature map of the source style image, then the style loss for the layer $\ell$ is the Euclidean distance between the two Gram matrices:

$$L_s^\ell = \sum_{i, j} \left(G^\ell_{ij} - A^\ell_{ij}\right)^2$$

The total style loss is the weighted sum of style losses at each layer:

$$L_s = \sum_{\ell \in \mathcal{L}} w_\ell L_s^\ell$$

To generate an image with content from one image and the style from another, we jointly minimize the content loss and the style loss and apply gradient descent starting from a white noise image.

Pre-canned Project 2: Seam Carving

In this project, I implement a seam carving algorithm that can shrink an image either horizontally or vertically to a given dimension. To determine the 'importance' each pixel has, I use an energy function that computes the sum of the absolute values of x and y gradients at each pixel. Then the lowest-importance vertical seam in the image is found using dynamic programming and removed. This process is iterated until the image has shrunk to the desired dimension. To generalize to horizontal resizing, simply rotate the image by 90 degrees and apply the same algorithm.

Some successful examples

Some unsuccessful examples