CS194 proj3, Nikhil Patel

Warmup

An unsharp mask is a technique used to artificially sharpen an image's edges. It functions by blurring the image and then subtracting the blurred image from the original, producing an image of "edges" that looks similar to this:

We compute one of these "unsharp masks" for each of the three color channels, and then add it to the original using a gaussian filter of kernlen=5 and sigma=1. Stacking these three channels produces the sharpened image.

It is evident that the lines on the tiger on the right (i.e. the sharpened image) are much clearer, and the fur and snow textures look grainier. The watermark and copyright notice have also become much more prominent, which makes sense given that their edges were already very well-defined in the original image (and the unsharp mask simply made the edges even more well-defined).

Hybrid Images

Let's say we want to combine the following two images:

First, we must select some points on the images to align them (I chose the eyes). Then, we can use the low frequencies from one image and combine that with the high frequencies from the other, which means that up close this image will look like a cat but further away it will look more like a human. At longer distances, only the lower frequencies are visible (i.e. details are lost) and so the "blurred" image (blurred using kernlen=31 and sigma = 0.05) looks human. Our final image looks something like this:

Without any cropping, you can see the top of the cat image (it produced a sharp line) at an angle -- the same angle that the cat image was rotated to in order to align the eyes.

I liked the idea of combining humans and animals, and so I decided to combine my girlfriend's face with a corgi, her favorite dog (using kernlen=31 and sigma=7):

This produced a fun combination which she thought was creepy:

I decided to do the same for my baby cousin, who seemed to enjoyed it more (using kernlen=31 and sigma=7):

I also tried to combine Derek with Spongebob, which turned out fairly poorly (using kernlen=31 and sigma=7):

Too late did I realize that it was because I was attempting to blend a cartoon (which is very high-contrast and does not lend itself to blending) with a photograph.


Let's conduct frequency analysis on one of these images by comparing the Fourier transform of the two input images with the filtered image, and then looking at the hybrid image. For reference, here is the image set we will consider:

Let's look at the two input images and their Fourier transforms (respective to above):

As we can see, the picture of the corgi

After we filter these images, we get the following:

Clearly, the outer frequency bands have been erased. When we combine these images to produce our hybrid, we get the following frequency graph:

If you look carefully, you can see the area around the origin slightly brighter, which is similar to the frequency graph for the corgi.

Gaussian and Laplacian Stacks

Let's consider this image of Lincoln:

(Note: I'll use "pyramid" here for simplicity, but in reality these are image stacks because there is no downsampling and so all of our levels are the same dimensions). To construct a Gaussian pyramid (with 4 levels), we simply iteratively applying a Gaussian filter (kernlen=9, sigma=1). Our image should get blurrier, and blurrier, like so:

Then, to construct a Laplacian pyramid, we simply subtract the (i)th and (i+1)th Gaussians from each other, providing us a stack that looks like:


Let's now apply this to the hybrid image of my cousin and the corgi (kernlen=9, sigma=1):

As we can see, as our Gaussian pyramid progresses and we see lower and lower frequencies, the low-frequency image of my cousin shows through. On the other hand, as we view higher frequencies in the Laplacian pyramid, the corgi is much more evident.

Multi-resolution Blending

As described in this paper, it is possible to take two images and blend them together using Gaussian & Laplacian pyramids. The high-level overview is that we blend the image at each level of the pyramid (or in our case, stack), which ensures that we blend the low, mid, and high frequencies separately from each other, and then combine them at the end to produce the blended image.

The classic example in the paper is the "orapple", a blending of these two images:

By using a gaussian pyramid on our mask (which is simply a binary mask taking the left half of the apple), which looks like this: (minus the top and left borders, those are only there to visually show the image bounds, otherwise the white would blend in with this page's background)

Using gaussians of kernlen=7, sigma=1 to produce the orapple:


We can apply this to other images. I found an image of a lady with two different expressions and decided to try combining them:

(kernlen=7, sigma=1)

Using the same binary half mask as before, this image was produced:

Unfortunately, the images are slightly misaligned -- although her nose-halves appears to be mostly aligned, her right eye is moved upwards because she's happy, while her mouth is misaligned due to the differences in expressions and her shirt also seems to be slightly out of place.

To remedy this, I decided to try a different mask: use the top third of her face from the left and the bottom third from the right:

The intention was: if you cover the bottom part of the image, she looks like she's smiling (or content) -- and if you cover the top part, she looks distressed. Except, I don't think she smiles with her eyes very much, because she just looks distressed (with her eyes moved further up than they normally are).


I wanted to add an eyeball to our fruit (because I'm "a bit weird", apparently), so I attempted using this cartoon-y eye first (with this simple mask): (kernlen=7, sigma=1)

Because the eyeball had such high-contrasting edges (in addition to being, well, a generally high-contrast image because it's a cartoon), the blending had almost no effect. So, I decided to choose a different image image -- one that was more natural, along with a mask I made in Preview:

Combining this with our fruit, we get some frankly disturbing images:

Gradient Domain Fusion

Introduction & Toy Problem

We want to be able to produce more complex seamless blends, as we did in multi-resolution blending. We start with a toy problem, where our goal is to simply recreate an image by computing a linear system of x- and y-gradients in the image. By computing all ((x,y), (x+1,y)) pairs, we have all the x-gradients in the image; similarly, by computing all ((x,y), (x,y+1)) pairs, we have all the y-gradients in the image. We can then create a matrix which, when solved using least-squares with the original image, will simply compute all the x- and y-gradients in the image. Thus our least-squares error will be zero and we will have our original image:

Even though our image is only 300x300, the least-squares optimization problem still takes a while to solve. Our A matrix has shape (26181, 13090) and our b vector is 26181 elements tall, so naturally the system solver will have to compute for a bit.

Poisson Blending

Now, we want to perform the difficult task of blending one image into another. Similar to the toy problem described above, we want to ensure that the gradients inside of the image are the same -- i.e. the small object we are blending into the background should maintain its form/structure so it remains a recognizable object. However, we also want to ensure that the background and colors matches the target image's background and colors so the blend looks natural. Thus, we have to add additional boundary constraints into our linear system. When we perform least-squares to find an image that minimizes error, our error will be "spread out" over the entire object's region, allowing us to blend the image seamlessly while maintaining the structure/form of the object to be blended.

I like planes, so I decided to blend a plane into the sky, like so (using an x, y offset of 200, 200):

Pretty good! When I zoomed in, there were no color fidelity issues or border blending problems. The entire image of the plane was slightly recolored in order to match the blue of the surrounding sky -- if you look carefully, the plane in the blended image on the right has a slightly more yellowish tint than the original for this exact reason.


I really liked that eye thing for some reason, so I decided to revisit it and generate some more eye-related images to see how different images would cause different color blending and distortions. Using the same source object made it easy to do analysis and compare.

Because the skin color around the hand is lighter than the skin color from around the giant eye, the entire eye is lightened.


We can see the same effect as in the last image set, only slightly more obvious since his skin is even lighter than that of the hand above.


Let's compare this to the multi-resolution blending performed in part 1.4. If we insert the image straight in (using the same mask as in 1.4), we get:

After we perform blending, the straight and jagged edges go away and are replaced with smooth gradients:

However, the colors are a bit off, especially in the eye area. Because of the surrounding image color (red and orange, respectively), the color gradients are messed up since the skin in the original eye drawing was... well, skin-colored. So, the multiresolution blending produced a better output image, with higher fidelity in the eyeball area. In general, multiresolution blending will handle color differences better, as the colors on the outer parts of the eye will slowly transition, whereas in Poisson blending, the colors for the entire eye become a bit messed up in order to compensate for the color transition along the border pixels. For images where the border and the target outline are similar colors, Poisson blending will work better as it is more sophisticated and allows for greater flexibility to "match" the surrounding image. (For comparison, let's revisit the multiresolution blended fruit results:)


I also realized what kinds of images perform well. For example, this bunny blended poorly because of the highly-textured nature of the grass surrounding the bunny: