Final Project

CS 194-26 Shivam Singhal December 10, 2021

For the final project of this amazing class, we were given the choice of recreating the results from a variety of academic papers in the field of computer vision. I chose to work on Gradient Domain Fusion, which is based on this Perez et al. 2003 paper, and Eulerian Video Magnification, which is based on this Wu et al. 2012 paper.

Project 1: Gradient Domain Fusion

The purpose of this project was to combine a source image and a target image in the domain fusion. The reason this technique is more effective than something we did in Project 2 or just naively cropping out an object and pasting it on a different background is because of how the human visual system works: we tend to be more sensitive to contrast than intensity. This assignment was split into two parts to achieve the seamless blending of an object or texture from a source image into a target image: working with a toy example to get our feet wet with gradient domain processing and implementing poisson blending.

Part 1: Toy Problem

In this part of the project, I computed the x and y gradients from an input image, and I used these values, along with the intensity of the pixels, to reconstruct the image. To make sure that this part of this project was implemented properly, I had to ensure that the input and output images were identical, within a certain tolerance range, and as we can see below, the task was successfully accomplished.

Original Image of Woody Reconstructed Woody Picture

Part 2: Poisson Blending

Now that we have verified our method works, we can move onto the real magic! The steps of the algorithm are briefly outlined below:

Prior to running the algorithm, a mask for the source images needed to be manually generated. This mask indicated the shape of the object we were interested in inserting, as well as where in the target image the object of interest would be placed. In order to accomplish this, I used starter code from this GitHub repository, as suggested on Piazza.

Preprocessing of Source, Mask, and Target Images:
The mask and source images were first padded to be the same size as the target image. We have previously done something similar to this in the homography project; however, since this wasn't the purpose of this assignment, I used some starter code.
For ease of computation, I referred to Brown's iteration of this project. From their starter package, I got a series of sample images, as well as a function to align the source and target images and associated row and column offsets that can be used for the image alignment.

Solving for the Blending Constraints:
To compute the image that we desire from the source, mask, and target images, I needed to set up a series of linear equations, which I then solved using least squares. I added in the vertical and horizontal gradients into our source image, and I passed the target image's gradients into our destination vector.
This objective was optimized with respect to pixel values for each of the color channels individually.

Creating the Blended image
Using the result of least squares for each of the channels, I constructed the output image.

Below are some of the results I got from this algorithm:

A bear Dead body

A bear attack!

An angry guy The Mona Lisa

The angry Mona Lisa

Cool jet Beautiful landscape

Fly away!

A cool car Green terrain

A cool car on green terrain

For this last picture here, we can see that the results aren't as clean as they are for the other images. This might simply be because the color of the source image is dissimilar to the color of the target image--they are vastly different shades of green, which is why the wheels of the care almost seem to be melting into the ground.

Project 2: Eulerian Video Magnification

The purpose of this project was to highlight small temporal image variations, which might be difficult to see by the naked eye. Below is a brief description of the steps followed:

Laplacian Pyramid:
We built a laplacian pyramid for each of the video frames in order to decompose them into different spatial frequency bands.
To create the laplacian pyramid, I started by creating a gaussian pyramid, which was built by taking the original image as the first layer and applying a gaussian blur to get the subsequent layers.
The laplacian pyramid was then constructed by subtracting the layers of the gaussian pyramid from each other by upsampling the smaller image frame to be the same size as the larger image frame.
The final layer of the laplacian pyramid was simply the last layer of the gaussian pyramid.

Temporal Filtering:
The values of the time series of each pixel were bandpass filtered to be in a desired frequency range.
In order to amplify small differences in motion, a laplacian pyramid was first built, and a butterworth bandpasss filter was applied.
In order to amplify small differences in color, a gaussian pyramid was built, and an ideal temporal bandpass filter was applied using a discrete fourier transform.

Amplification and Image Reconstruction:
We amplify the frequency bands by a specified amplification factor. This makes the changes in frequency more apparent.
The pyramid that was constructed for the frames of the video is compressed to reconstruct the video. This is done by upsampling the smallest layers and adding it to subsequent layers.
These steps are repeated for all color channels. Due to color artifacts, the video was first processed in YIQ space and then transformed back into the RGB space.

Below are some of the results I got from this algorithm:

Click me to see video of Ideal Filter on Face! Click me to see video of Butterworth Filter on Face!

The frames of the video were split into 5 pyramid layers, and frequencies between 0.83 and 1 were kept when the bandpass filter was applied. An amplification factor of 100 was used to see clearly the pulse of the man, and a cutoff factor of 1000 was used to limit the frequencies. Since the color magnification was more effective in showcasing differences between the frames of the video, I chose to only apply the ideal temporal filter to the baby2 video.

Click me to see video of Ideal Filter on Baby2!

I again used 5 pyramid layers for each of the video frames, and I kept frequencies between 2.33 and 2.67 using the bandpass filter. An amplification factor of 150 was applied, and a cutoff value of 600 was used to limit the values that pixel frequencies can take on.

Click me to see video of Ideal Filter on a Subway video!

This video was also one that was used by the authors of the original paper. When amplified, we can clearly see the motion of the subway car because of the flashing lights. I used 5 pyramid layers to decompose each of the frames in this video as well, and I considered the strength of the signal at a variety of frequencies to choose parameters. An amplification factor of 60 was used, along with a cutoff value of 90. Additionally, only frequencies between 3.6 and 6.2 were kept by the bandpass filter.

It was pretty difficult to find good amplification values for the videos. Before looking at the values that were chosen by the authors of the publication, I tried playing around with different numbers to get the best results with the most dramatic, yet visually pleasing amplification of the motion and color. It was also difficult to get the correct range of values in which to apply the bandpass filter.