Project 3: Frequencies and Gradients

Part 1: Frequency Domain

1.1: Unsharp Masking

This utility will sharpen photos. Most utilities implemented in this project require heavy use of Gaussian blur, since Gaussian blur effectively removes high frequency. To implement a custom Gaussian blur filter, I made a kernel using OpenCV's getGaussianKernel function and then convolved it with the scipy fftconvolve, which uses fast Fourier Transform for efficient convolution.


freqs
Custom Gaussian Kernel

Take the original image F and a 11x11 size kernel with sigma 1 Gaussian-blurred image G:


freqs
Edge Details (F - G)
freqs
Edge Addition (F + alpha * (F - G))

Adding edge detail using a parameter alpha F + alpha * (F - G) (similar to Laplace) allows for a sharpening effect as often seen in editing software like Photoshop.

1.2: Hybrid Images

I was sitting in Yali's Cafe having a cup of nitro brew due to sleep deprivation, and so I decided to show my mixed emotions:


freqs
Happy (with Fourier frequency)
freqs
Mad (with Fourier frequency)

Since the pictures are much the same, the frequency is much the same. Using the techniques described by Oliva et. al, these pictures can be combined with a really cool implication:

  1. Take one picture and apply a low-pass filter (make low frequencies dominate). In this case, Gaussian blur.
  2. Take the other and apply a high-pass filter. In this case, the original image - Gaussian blur.
  3. Average the two together.

I found that the best sigma, or cutoff values, for the frequencies of Step 1 and Step 2 were equal sigma values. Intuitively this makes sense because the Gaussian I'm applying is the same for both filters. Below is the effect as sigma changes from 0-19.


freqs
Happy (low-pass)
freqs
Mad (high-pass)
freqs
Average

The final implication is that when viewed up close, my face looks mad, but when viewed from afar, my face looks happy. The GIF of the average demonstrates this effect through change. Viewing it from a distance, there is little change. Here is the final picture at a sigma of 12, which I thought was the optimal sigma:


freqs
the state of Berkeley

The result images are darker because I just average the raw Laplace with the Gaussian. However, normalizing the Laplace adds enough light back in for the picture to have fairly "normal" lighting again. All images after this will have normalized Laplace.

Order of low frequency vs. high frequency matters a lot. A friend of mine, Carissa Tinoco, volunteered to have her face meshed with the Teletubbies sun... to maybe somewhat of a creepy effect:


freqs
Low
freqs
High
freqs
I will have nightmares...
freqs
Low
freqs
High
freqs
Convincing enough!

Aligning two faces also seems to work fairly well. Seen below is a hybridized version of these two pictures of Yasmine Frigui and Mumu Lin.


freqs
Low
freqs
High
freqs
Hybrid

1.3 Gaussian and Laplacian Stacks

It is possible to take one of these hybrid photos and use the same, powerful Gaussian methods to filter towards the original low/high frequency photos. Here is an example using a 5-level stack that illustrates this process:


freqs
Guassian (left), Laplace (right)

This process is illustrated by this image of Salvador Dali's Lincoln in Dalivision, which when separated by multiple applications of gauss and laplace functions, separates into Gala viewing the ocean and a portrait of Abraham Lincoln.


freqs
Dali Decomposition

Note that the Laplace images are actually in reverse. That is, the first laplace level is the original image minus last gaussian level. As a little experiment, I applied this same sequence to the art of Yung Jake, which uses emojies to produce facial images, which unfortunately were too big to include on this website but are present at my website.

1.4 Multiresolution Blending

Before I explain the gist of this part, I'd like to highlight that although we could've achieved the same effect in Photoshop, this technique is powerful. Take this image, in which I either duplicate or remove Jennifer from the picture after taking two pictures at Nefeli Cafe.


freqs
Two Jens
freqs
No Jens

Our goal is to make an architectural improvement to Stanford, using the Campanile.


freqs
Hoover
freqs
Sather

Let's define Gauss(i) as the ith application of the Gaussian filter on an image. The algorithm in question requires the use of the Laplace function, which takes the form of Laplace(i) = Gauss(i - 1) - Gauss(i). We'll also use an altered version of the Laplacian stack, which defines the last value as just Gauss(last). This ensures that the sum of all the values in the Laplacian stack is the original image. Here are all the images involved in this process.


freqs
Campanile (Top, Laplace), Hoover (Middle, Laplace), Campanile Mask (Bottom, Gauss)

The result is the sum of all these images, where each pixel value is weighted by the mask: mask * campanile + (1 - mask) * hoover.


freqs
The Real Tower

We see a major architectural improvement on Stanford's previously measly tower.

Part 2: Gradient Domain

While the previous part focused on the intensity, or frequency, of light, this part will focus more on the change, or gradient, of light. In order to blend two images together, my goal here is to find just the right values for the pixels in the hole created by the mask to preserve the original gradient. Let's denote s as the data we want to insert, t as the image we want to insert the data over, and v as the intensity values to search for (i.e. the solution to the set of linear equations). We sum over all i in the hole and over all j relative to the four neighboring cells of i. Instead of calculating Laplacians, I set up a "double" least squares problem, by the following equation:

argmin_v(sum(((v_i - v_j) - (s_i - s_j))**2) + sum(((v_i - t_j) - (s_i - s_j))**2))

This also takes care of the boundary edge case, since we're comparing to the target image in the second part of the equation. The claim here is that solving this equation should find the optimal values that remove the rough boundary caused by stitching in the target image.

2.1 Toy Problem

As a sanity check, our goal in this part is to apply the algorithm from small source image -> small source image. The algorithm should give back the same source image, since the optimal value intuitively for any 2D data matrix on itself is the original image.

In order to produce this mapping, I make sparse matrices by initializing the following matrices to 0:

I use scipy.sparse.linalg.lsqr to solve this system of equations, which produces the following output:


freqs
Input
freqs
Output

Now that we've confirmed this method can work, I'm going to move on to using masks to apply this to any two possibly distinct pictures.

2.2 Poisson Blending

First off, I emphasized eliminating redundant work and reducing complexity to just the number of pixels in the mask (the hole, so to speak). Therefore, I limit the number of equations to N * 2 + boundaries, where N is the number of pixels in the hole and boundaries is a special value that represents all pixels on the "edge" of the hole.

The key to making this work is in the boundary pixels. In this case, my sparse matrix (I used lil_matrix structure, or linked list matrix, to make editing the sparse matrix easier in Python, as well as compatible with the speed-up method, scipy.sparse.linalg.lsqr, effectively reducing runtime by like 100 fold), will contain certain states about the boundaries "plucked" from the target image. The idea is, praying to the gods of matrices, the boundary equations will propogate the gradient of the target towards the center of the source.

I align the faces and produce the mask with Photoshop (i.e. I was too lazy to reinvent the wheel here). The result, on faces especially, works creepily well:


freqs
Daenerys Targaryen
freqs
Mumu Lin
freqs
Mumu Targaryen (one version)
freqs
Daenerys Lin (another version)

While the first alignment is more faithful to the environment lighting, the second alignment is more faithful to the true structure of Mumu's face. I noticed that in both, the cutoff was very distinct on the face insertion's border. Before I had been using binary values for the mask. To achieve a more natural paste effect, I used a Gaussian blur on the mask and changed the insertion code to interpolate using the mask intensity:

tgt(x, y) = mask(x, y) * v(N) + (1 - mask(x, y)) * tgt(x, y) where v(N) is the Nth solution to the hole equation.

This idea was incredibly effective and captures the best of both worlds...


freqs
The True Khaleesi

On the topic of face changes, taking the earlier pictures of Yasmine and Mumu results in a weird propogation effect where the darkness in the hair moves towards the center. It clearly illustrates the way saturation is affected by the boundary


freqs
Yasmine
freqs
Mumu
freqs
Mumine

Also seemed to work quite well in creating a vintage effect when swapping out Taylor Swift with another friend.


freqs
Taylor
freqs
So-Hyun
freqs
Vintage So-Hyun

Finally... perhaps the failure of the lot but certainly one of the most educating... We see what happens when the boundaries of the image do not quite have uniform saturation. This is perhaps where Laplace shines as a (still) useful algorithm when dealing with odd edge cases (literally).


freqs
Laplacian (reproduced for convenience)
freqs
Poisson (with Gauss blend / interpolate)

The blue from the sky seems to have infected the Campanile. This goes to show that one method is not necessarily superior over the other. When it comes to facial expressions and other images that have uniform boundaries, Poisson rules, but when it comes to irregular boundaries and when complete blending to match surrounding lighting isn't really a necessity, you might prefer Laplace.

A few other interesting erroneous effects that manifested in Sather tower...


freqs
GhostPanile (inverted)
freqs
Segfault Tower (numpy memory problem???)