Daniel Edrisian's CS 194 Final Project


Part 1: Augmented Reality

In this project, I attempted to re-create the "Poor Man's Augmented Reality" video where a box is being tracked with a synthetic object is added on top of it in a 3d scene.

Steps

The Box: First, what I did was create a white box by wrapping a shoebox with a white grid-like wrapping paper. Then, I marked the intersections of the grid with black dots. These dots would serve as keypoints that would be used to track the box as it rotates in the scene. I measured the length of the dots, which were about an inch long.

Tracking: Then, I used an existing keypoint tracker, MedianFlow, from the CV2 library. This shows the points in their 2d coordinates and displays them.

Calibration: Following this, we need to calibrate the camera by fitting the camera projection matrix to project the 4 dimensional real world coordinates to 3 dimensional image coordinates. For calibration, I used cv2.calibrateCamera.

Projecting Onto the Scene: After getting the camera projection matrix, I overlay a cube onto the box by using the draw function and passing the projected points. The video is read frame by frame and a box is overlaid for each frame.

The result is... beautiful.

Final Output

Source video Overlaid Box

Part 2: A Neural Algorithm of Artistic Style

The aim of this project was to implement the research paper by Leon, Alexander and Matthias – A Neural Algorithm of Artistic Style.

Working and experimentation:

For this project we are using the VGG19 architecture of CNNs. VGG19 is a variant of the VGG model which in short consists of 19 layers (16 convolution layers, 3 Fully connected layer, 5 MaxPool layers and 1 SoftMax layer). For this project, the researchers are not using the Fully connected layers from the architecture. Another alteration to the architecture was that we used average pooling instead of the MaxPooling layers to get better appealing results.

The main crux of the project is the Style Reconstruction part. We have the the outputs of each layer in the network, we basically reconstruct the original image from the different features from different layers. When we experiment with the CNN, we see that the lower layers capture the low-level features of the images. So, if we want a more visually appealing artistic image, we should use the features of the higher layers.

Training the network, I realized that we can't perfect the results when the content is from one image and style is drastically different. So, based on the methods in the paper, I created two loss functions each for content and style and the purpose of training the network is to minimize them. If we give more weightage to the style image, the content of the image will barely exist by the end. On the other hand, if we give weight to the content, the image doesn’t show that much of an artistic style. So, it is basically a trade-off between the two and it took me a lot of time to find the perfect parameters for my model.

Results

These are the input/style images that I used for this project:

Content Style

The losses are minimized using style weights, content weights, and the number of epochs. So, if I set the style weight ration to content weight as 1000000:1 with 200 epochs, I get the following results and losses:

Loss Output

By increasing the epochs, I will reduce the losses even further and the image will look something like:

Loss Output

But just increasing the epochs doesn’t change the relative weights between the style and content, so we can change the parameters to be something like: Style:content = 1000:1 – giving the following results:

Loss Output

As we can see, assigning a smaller weight to the style outputs a lesser artistic effect on the output image. So, after a lot of experimentation, the best trade-off ratio for this image is the one I used initially - 1000000:1. For the next example, a style ratio of 1000:1 works better.

LSD Cat

This is your brain on catnip

Style Content Output

Conclusion

I had loads of fun with the project, even more so than the AR one. It took me a lot time and effort, but the results are worth it. I hope you enjoyed the picture of my cat tripping on catnip. Feel free to email me at edrisian@berkeley.edu for more cat pictures.