CS 194-26 Final Projects

Sarthak Arora

Project One: Neural Style Transfer

In this project, the goal was to transfer the style of one image to another, while keeping the content of the second image. This was done by having a loss function for the style and the content of our output image and trying to minimize both these losses during the training of our neural network - while running backprop. We used VGG 19 as the neural network of choice, for our content loss we take the MSE Loss between the target and input image. For our style loss, we take MSE loss between the gram matrix of our target features and our current features. CNN1_1, CNN2_1, CNN3_1, CNN4_1 and CNN5_1 were the layers of interest for the style loss and CNN4_2 for the content loss. The overall loss was a weighted sum of the two losses where the 2 weights were hyperparams to tune. Style needed a much higher weight. LBFGS optimizer was used.

These were the style images that were used:

Picasso
Picasso Style
Van Gogh
Van Gogh Style
Scream
Scream Style

These were the content images that were used:

Sarthak
Sarthak (Me)
Dancing
Dancing Girl
Doe
Doe
Mona Lisa
Mona Lisa

Here are some of the results. We see these are pretty good, but that in case of pictures with small to no textural patterns (such as the Scream), there is some failure in the style transfer.

Result Result Result Result Result Result Result Result Result Result Result Result

Project One Learning

This was one of my favorite projects and I learnt a lot about feature maps and how we can define our own loss functions to achieve different objectives. It was nice to also get a taste of using pre defined architectures and islotaing certain layers to achieve certain tasks.

Project Two: Lightfield Camera

In this project, we had to carry out depth refocusing and aperture adjustment to create the effect of light field cameras. We work with a 17 x 17 grid of pictures and mimic lightfield camera images with varying focus length and aperture size using the pictures taken by normal cameras from different positions.

For depth refocusing, we take the average of the photos which are slightly shifted from the center camera with a certain stride. Our camera grid is 17*17, making the center (8,8). Thus we shifted each image by (i - 8, j - 8) * q, where a is a changing parameter that accounts for focusing at different depths. The results can be seen here below:

Chess
Chess Original
Chess
Chess Depth Refocused
Lego
Lego Original
Lego
Lego Depth Refocused

For aperture adjustment, we know that if we average with many images will create a large aperture, while averaving with only a few images will create a small aperture. We had a hyperparam signifying the radius of our camerag grid signifying aperture and only performed shifting and averaging on images that fell in this varying aperture/radius. Here are GIFS showing the results:

Chess
Chess Aperture Adjustment
Lego
Lego Aperture Adjustment

Bells and Whistles

I decided to do manual data collection. I captured 25 pictures in a 5 x 5 grid of my mug. I ran the algorithm on my mug and below are the results. We see that the photos turn out extremely blurry. This is maybe due to bad manual photo capturing of the photos to mimic the function as a grid.

Chess
Chess Aperture Adjustment
Lego
Lego Aperture Adjustment

Project Two Learning

This project was really cool and I didn't think it would be so easy to adjust aperture and depth of images. I think the disadvantage of such techniques is obviously that you need many images to get good results which may not always be feasible.