Jose Chavez

Images can be manipulated by altering the frequencies and gradients of the pixel intensities. This is because frequencies and gradients affect how a person observes changes in intensities; areas with different frequencies are noticed more than areas with more similar frequencies and gradients. In this project, we experiment with some basic and complex image operations that alter the frequencies and gradients of images.

One of the most basic operations that can alter the frequencies of an image is blurring. Image blurring is a process in which a small matrix iterates through the pixels of an image and calculates new pixel intensities based on the multiplication of values inside the smaller matrix. Let's call this smaller matrix the kernel. For this project, I will be using Gaussian kernels, which are often used for blurring images. Gaussian kernels in this project are made using OpenCV's methods.

cv2.getGaussianKernel(size, sigma)

cv2.filter2D(image, kernel)

Funny enough, we can actually "sharpen" an image using gaussian blurring. We know that blurring smoothes out high frequencies, so subtracting it from the original image should give us the removed high frequencies. This gives us the high-pass of the image. Using the equation below, we can "sharpen" an image with a specified alpha value.

image + alpha*(image - image_blurred)

The image below was a blurry picture of my dog Max. Using the equation above, I sharpened it with a 5x5 gaussian kernel with a sigma value of 5.

Max

alpha = 5	alpha = 10
alpha = 25	alpha = 50

Blurring and generating high-pass images also allows us to blend two images to form a hybrid image. This is when one high-passed image overlays a very low-passed image, producing a hybrid image when looking at the distance. That is, the image appears to change as the distance changes due to how our eyes interpret frequencies at different distances. We can specify "cut-off" frequencies that affect when the image begins to change as we take a step back. The "cut-off" frequency, for these examples, is defined as the "frequency for with the amplitude gain of the filter is 1/2," as mentioned in the paper "Hybrid Images" by Olivia, Torralba, and Schyns.

Below is an example of an image of Pope Francis and I blended together. The cut-off frequency for Pope Francis was 48 while the cut-off frequency for me was 16. This gap was added because, according to "Hybrid Images," images with little cut-off overlap have "unambiguous interpretation." I will experiment with these values further on.

Pope Francis

Jose

Hybrid in gray scale

The two images were aligned by the eyes. Notice that when you take a step back, you see Pope Francis' smile, but with my facial structure! Below are the log magnitudes of the Fourier transform of the two input image, the filtered image, and the hybrid images.

FFT of Pope Francis	FFT of Jose
FFT of low-passed Pope Francis	FFT of high-passed Jose

FFT of hybrid image

Below are more examples! The cut-off for the shark was 60, while cut-off for Max was 10. For the Jose, ealge hybrid, the cut-off for Jose was 100 while the cut-off for eagle was 50.

Shark

Max

Happy shark with dog eyes!

Jose

Max

Jose with eagle face

Below is a failed example, in which only one of the two blended images was ever seen. This could have been because the two images were really similar. Even with cut-offs of 60 and 27 respectably, calm Jose dominates, with only small traces of mad Jose up close.

Calm Jose

Mad Jose

Can only see calm Jose

One way to examine the frequencies of hybrid images is to use Gaussian and Laplacian stacks. At a glance, these sound like pyramids. However, with stacks, images are never downsampled but instead blurred. A Gaussian stack is essentially an array of one image recursively blurred by the same Gaussian kernel. A Laplacian stack is a stack of band-pass images. To obtain a band-pass image, we subtract a Gaussian layer from a higher Gaussian layer.

In lecture, we saw the hybrid image painting by Salvador Dali of "Lincoln and Gala." From a distance, the image looks like Abraham Lincoln. Up close, we see a naken women. To better examine this effect, below are the Gaussian and Laplacian stacks. For the Gaussian blurring, I used a 45x45 kernel, starting with a sigma of 1. The recursive blurring will eventually apply a Gaussian kernel with sigma 1, 2, 4, 8, and 16 in that order.

Sigma = 1

Sigma = 2

Sigma = 4

Sigma = 8

Sigma = 16

If we take a look at the third Gaussian level, that image is what one would see from a distance, the outline of Abraham Lincoln's face. Let's look at the stacks for my image with Pope Francis.

Sigma = 1

Sigma = 2

Sigma = 4

Sigma = 8

Sigma = 16

It's at the fourth Laplacian stack where we see the outline of the hybrid image.

Other than examining the hybrid image, Gaussian and Laplacian stacks are useful for another blending process: multiresolution blending. As the name suggests, we combine different levels of resolution to smoothly blend one image with another via image spline. An image spline is a smooth seam joining the two images.

The algorithm for this blending has four parts. (1) Construct Laplacian stacks LA and LB for the two images A and B. (2) Construct a Gaussian stack for our mask, a binary image specifying the region from each image to join. (3) Form a combined stack LS from LA and LB using nodes of GR as weights.

LS[i] = GR[i] * LA[i] + (1-GR[i]) * LB

(4) Finally, obtained the splined image S by summing the levels of LS. Below is my favorite result with using a mask that covers the left half of the image dimensions. I also included the Laplacian stacks.

Earth	Neptun
Mask	Blended image

Below is a result using two different faces.

Mad Jose	Fine Jose
Mask	Blended image

We can also blend images with irregular masks. Below are a two examples.

Fine Jose	Mad Jose
Mask of mouth	Blended image

Monitor	SF skyline
Monitor mask	Skyline inside monitor

We saw that we can blend two images together using Laplacian and Gaussian stacks. While this works for some images, it wouldn't work well for images of completely different backgrounds and "gradients." Hence, we introduce Gradient-domain processing. Gradient-domain processing is another method used to smoothly blend one image into another. The trick to providing a smooth blend between two images lies in tricking the eyes into perceiving no change between the pixels in the background image and the pixels in the blended image. Thus, we are calculating gradients and adjusting pixel intensities that minimize the difference between the gradients between the two images. Thus, Gradient-domain processing boils down to be a least squares problem, represented by the following function.

Poisson blending equation

The first half of the right hand side deals with neighboring pixels inside the image to be blended with the background, represented by "t". The second half deals with "v_i" being a pixel inside the image to be blended and "t_j" is a neighboring pixel in the background image. The pixel intensities inside the original image to be blended are "s_i" and "s_j", which we call source pixels. We are solving for "v_i" and "v_j" such that the whole sum is minimized, keeping in mind the gradients in the source image and the target image.

Let's solve the inner gradients on a small example. To do this efficiently, we have to construct a sparse matrix A, a vector B, solving

Av = B

A is a matrix with the number of rows being the dimension of the original image times 2 plus 1. We multiply by two because all the neighbor gradients can be solved for just from taking the right neighbor and bottom neighbor of each pixel. "v" will contain all of the values to solve for, and if we are just solving for the inner gradients, the correct result is the original image, hence the original gradients. If we are going to subtract v_i - v_j as the above equation suggest, A will be constructed to have a row of -1 and 1 in the correct location. Finally, the corresponding row in B will be s_i - s_j.

Once A and B are constructed, I use

scipy.sparse.linalg.lsqr

as my least squares solver. Below are the results for the toy image.

Original toy image

Toy image reconstructed

In order to blend an image into another, all of the gradients, including the boundaries, must be calculated. To do this, we use Poisson blending, which solves for pixels using both the source image and the target image, the background image. Below is the full Poisson equation. We saw the first half in the toy problem, but this time we consider the second half, focusing on target pixels.

Thus, the trick to this part is how to include boundary pixesl into our sparse matrix A that we saw in the toy problem. The tactic that I used involved rolling the mask up/down each axis, giving me access to pixels just outside the mask. For example, if I roll my mask up by one, bitwise OR, with the original, I obtain the mask but with the upper boundary extended by 1. To just get the upper boundary, I subtract the original mask. Now I have a mask to get the source and target pixels outside the mask, making it then easier to construct my A and B vector we saw earlier. In sum, a lot of matrix vectorization and mask rolling helped me complete my sparse matrix.

However, feeding in my full A into the least squares caused me to run out of memory. This was due to the A matrix being an numpy array, which requires a lot of memory. It worked earlier on the smaller toy example, but now I actually had to construct a sparse matrix. I used the following constructor:

scipy.sparse.csr_matrix((data, (rows, cols)), shape)

Using this constructor, I had to alter my implementation from placing everything in A directly to constructing two large arrays of row and column indices and the final data array where

A[x, y] = z ==> a[rows[i], cols[i]] = data[i]

Using this approach greatly the increased runtime for the least squares solver! Moreover, in the B vector, a certain pixel being solved for, call it v_inside, was set to 1 in A but to s_inside - s_outside + target_outside in B.

v_inside = s_inside - s_outside + target_outside

This is a derived version of the second half of the Poisson equation. Having the boundaries of the masks allowed me to vectorize a lot of the code for B.

My favorite result came from blending the shark seen earlier into a picture from scuba diving. The water in the original shark picture is very dark compared to the light water in the scuba photo. But the final Poisson blending adjusted the gradients very well, to the point where it doesn't look like the shark was in different water. In fact, even the gradients on the skin of the shark look more realistic in the brighter lighting. In short, it really looks like I took picture of my brother swimming under a great white shark!

Scuba diving	Shark
Naive blending (copy and pasted shark image values)	Poisson Blending

Below are some more experiments with Poisson blending. I noticed that Poisson blending might actually alter the internal colors slightly in effort to match the colors on the target image. This is seen with Max on the golf course. The image with the diver in the pier blended better, in which the gradients of the pier water were better translated into the water around the diver.

Max sitting on grass	Playing golf
Naive copy of values, reveals difference in grass	Poisson Blending: Max sitting on golf course.

Diver in water	Overlooking pier
Naive copy of values, reveals difference in water	Poisson Blending: Diver in pier

Below is an example of a failed image. I placed the golfer seen earlier into a different picture of the golf course, overlooking a clip. The lower half of the original golfer is pasted on the grass while the upper half of the body is pasted over the trees in the background, having very different intensities. With this, we see that the golfer's legs are really bright while his upper body looks dark. The Poisson blending is trying to match the difference in the foreground and background intensity, but since in space the golfer is in the middleground, the resulting gradients do not seem correct.

Failed image in which interal gradients changed drastically in order to fit the background intensity.

Earlier we saw that we can use Laplacian stacks to blend images. How would the same image look with Poisson blending? To see the difference, I chose the skyline inside monitor blend, in which now I Poisson blend the skyline inside the monitor. Below are the differences.

Monitor	SF skyline
Laplacian stack blendings	Poisson Blending of skyline inside the monitor

The Laplacian result does look prettier and brighter, in which the monitor is a window into the SF skyline. But, the Poisson blended image look more realistic, as if the SF skyline was the desktop picture of the monitor itself. This isn't to say that the Laplacian blended result also looks like a desktop picture, but the Laplacian result looks too good compared to the rest of the monitor picture. If you look at the bottom left-hand corner of the Poisson blended image, the skyline picture blends with the dark gradients of the monitor. The Poisson blended image also appears to match the low, warm lighting from original monitor picture. In sum, the Poisson blended works best for producing more realistic and matching gradients than Laplacian blending, considering the world inside the image. Laplacian blending, however, looks more stylish to the eye. For more stylized images, Laplacian blending would work better, but for more serious blending, Poisson wins.

Project 3: Frequencies and Gradients

cs194-26-adu

Part 1: Frequency Domain

Part 1.1: Unsharp Masking

Part 1.2: Hybrid Images

Part 1.3: Gaussian and Laplacian Stacks

Part 1.4: Multiresolution Blending

Part 2: Gradient Domain Fusion

Part 2.1: Toy Problem

Part 2.2: Poisson Blending