CS194 Final Project Fall 2021

Michael Huang

Part A: Reimplement: A Neural Algorithm of Artistic Style

In this project, I reimplemented A Neural Algorithm of Artistic Style by Gatys et al, to tranfer style from a style image to a content image. I reimplement it with my own style and content image.

Implementation Details

They use a pre-trained VGG-19 architecture, paying close attention to certain layers. In the model below, the layers in particular 0, 5, 8, 19, 21 and 28. 0, 5, 8, 19 and 28 were used for the style representations, while 21 was used for the content representation.

What is interesting about this implementation is that we are changing the target image and nothing else about the VGG or any other image!

style_weights = [1e3/n**2 for n in [64,128,256,512,512]] for 0, 5, 8, 19 and 28.

There are two kinds of losses that we calculated, style loss and content loss. Style loss is computed by finding the Gram Matrix for each style feature and comparing it to the target gram. Content loss is the difference beteen target and content features at layer 21. We combine these two losses to create our total loss. Weighting style loss at $10^6$ and content loss at $1$ worked best. I used an adam optimizer with lr=0.003, and trained for 2000 epochs.

Sequential(
  (0): Conv2d(3, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  (1): ReLU(inplace=True)
  (2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  (3): ReLU(inplace=True)
  (4): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
  (5): Conv2d(64, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  (6): ReLU(inplace=True)
  (7): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  (8): ReLU(inplace=True)
  (9): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
  (10): Conv2d(128, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  (11): ReLU(inplace=True)
  (12): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  (13): ReLU(inplace=True)
  (14): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  (15): ReLU(inplace=True)
  (16): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  (17): ReLU(inplace=True)
  (18): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
  (19): Conv2d(256, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  (20): ReLU(inplace=True)
  (21): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  (22): ReLU(inplace=True)
  (23): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  (24): ReLU(inplace=True)
  (25): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  (26): ReLU(inplace=True)
  (27): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
  (28): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  (29): ReLU(inplace=True)
  (30): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  (31): ReLU(inplace=True)
  (32): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  (33): ReLU(inplace=True)
  (34): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  (35): ReLU(inplace=True)
  (36): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
)
Style Content Combined

During training, this is how the target progressed.

Start of Training Midway Final

Part B: Poor Man's Augmented Reality

In this augmented reality project, I projected a 3D cube into a video, and kept it in the same place throughout the video.

Setup

Using a rubik's cube as a box with a premarked grid and points, I captured a video of me moving around the rubik's cube.

Keypoints with known 3D world coordinates

Starting by marking the points in the first image (using plt.ginput) of the video and getting their 3D world points, I marked points on the rubik's cube. Then, using opencv MedianFlow, I track these points throughtout the video. I orignally selected more points, but those points were not tracked well, so I opted for less points.

Axes

Calibrating the Camera

Using the 2D image coordinates of marked points and their corresponding 3D coordinates, I used least squares to fit the camera projection matrix to project the 4 dimensional real world coordinates to 3 dimensional image coordinates. This is very similar to project 4, but with an extra dimension. I preform this for each frame of the video.

Projecting a cube in the Scene

Using the homography matrix, I can map the 3D points of a cube [0,0,2],[0,1,2],[1,1,2],[1,0,2],[0,0,3], [0,1,3], [1,1,3], [1,0,3]] to the 2D points in each frame. I use the draw function to draw the cube on the image each frame, and combine them all to create the final video.