In this project, the goal was to transfer the style of one image to another, while keeping the content of the second image. This was done by having a loss function for the style and the content of our output image and trying to minimize both these losses during the training of our neural network - while running backprop. We used VGG 19 as the neural network of choice, for our content loss we take the MSE Loss between the target and input image. For our style loss, we take MSE loss between the gram matrix of our target features and our current features. CNN1_1, CNN2_1, CNN3_1, CNN4_1 and CNN5_1 were the layers of interest for the style loss and CNN4_2 for the content loss. The overall loss was a weighted sum of the two losses where the 2 weights were hyperparams to tune. Style needed a much higher weight. LBFGS optimizer was used.
These were the style images that were used:
These were the content images that were used:
Here are some of the results. We see these are pretty good, but that in case of pictures with small to no textural patterns (such as the Scream), there is some failure in the style transfer.
This was one of my favorite projects and I learnt a lot about feature maps and how we can define our own loss functions to achieve different objectives. It was nice to also get a taste of using pre defined architectures and islotaing certain layers to achieve certain tasks.
In this project, we had to carry out depth refocusing and aperture adjustment to create the effect of light field cameras. We work with a 17 x 17 grid of pictures and mimic lightfield camera images with varying focus length and aperture size using the pictures taken by normal cameras from different positions.
For depth refocusing, we take the average of the photos which are slightly shifted from the center camera with a certain stride. Our camera grid is 17*17, making the center (8,8). Thus we shifted each image by (i - 8, j - 8) * q, where a is a changing parameter that accounts for focusing at different depths. The results can be seen here below:
For aperture adjustment, we know that if we average with many images will create a large aperture, while averaving with only a few images will create a small aperture. We had a hyperparam signifying the radius of our camerag grid signifying aperture and only performed shifting and averaging on images that fell in this varying aperture/radius. Here are GIFS showing the results:
I decided to do manual data collection. I captured 25 pictures in a 5 x 5 grid of my mug. I ran the algorithm on my mug and below are the results. We see that the photos turn out extremely blurry. This is maybe due to bad manual photo capturing of the photos to mimic the function as a grid.
This project was really cool and I didn't think it would be so easy to adjust aperture and depth of images. I think the disadvantage of such techniques is obviously that you need many images to get good results which may not always be feasible.