CS 294-26: Intro to Computer Vision and Computational Photography, Fall 2022

Project 2: Fun with Filters and Frequencies

Katherine Song (cs-194-26-acj)



Overview

In this project, we explore the use of simple image filters -- namely high- and low-pass frequency filters -- to achieve various effects. Such effects include basic photo enhancement through blurring and/or sharpening. By applying different combinations of filters to different images and also creating "stacks" of filtered images, we also create interesting new "hybrid" or "multiresolution-blended" images.

Part 1: Fun with Filters

Part 1.1: Finite Difference Operator

cameraman.png
Convolving the cameraman image from the assignment description (above) with finite difference operators D_x and D_y results in the following partial derivatives in x and y. The partial derivative in x reveals the vertical edges in the image, whereas the partial derivative in y reveals the horizontal edges.
cameraman.png partial derivatives
The gradient magnitude image is simply generated by taking the square-root of the sum of squared partial derivatives in x and y: sqrt((df/dx)^2 + (df/dy)^2). The result reveals "edge strength" in both the vertical and horizontal directions:
cameraman.png gradient magnitude
A binarized image is created by setting all the pixel values from the above gradient magnitude image to 0 except those above a certain threshold, which we set to 1. As seen below, setting the threshold too low results in so many supposed edges that we lose features of the image to noise. When the threshold is too high, we detect too few of the edges, and features are lost:
Binarized cameraman w/ threshold 0.01 (too low)
Binarized cameraman w/ threshold 0.9 (too high)
The threshold that I settled on was 0.08. At this threshold, I could make out both most details of the background buildings and the man's glove (these were lost when the threshold was too high), and I cold also make out details of the man's face and camera (these were lost in too much noise when the threshold was too low):
cameraman.png binarized image, threshold = 0.08 (good to me)

Part 1.2: Derivative of Gaussian Filter

I created a blurred version of the original image by convolving it with a 5x5 Gaussian (using a 5x1 Gaussian with kernel size=5 and sigma=1 with cv2.getGaussianKernel() and taking the outer product with its transpose). Repeating the procedure from Part 1.1, I got the below results. Compared to the results with just the difference operator, the resulting binarized edge image below has more clear continuous, clean edges (less "hatching" as the result from Part 1.1) and fewer "false" edges. I was even able to lower the threshold to 0.03 to obtain more "real" edges (especially for the background buildings) without creating false ones in the man's face and on the grass. This is unsurprising because as we learned in class, the Gaussian filter is an anti-aliasing filter that removes much of the noise in the original image.
Blurred cameraman.png partial derivatives
Blurred cameraman.png gradient magnitude
Blurred cameraman.png, binarized with threshold 0.03
Instead of convolving the image with a Gaussian and then convolving the potentially large result with the finite difference operators, we can save time (more relevant for larger images) by first convolving the Gaussian and finite difference filters (which are much smaller and thus take less time to compute) with one another and then applying the result to the original image (which might be large). The results of doing so are below.
cameraman.png, Gaussian + finite difference combo filter
The results of using the single combo filter are identical to the results of convolving the image with the Gaussian and then convolving the result with the finite difference filter (as checked with the np.array_equal method). One technicality to note is that when convolving the filters together first, I needed to ensure that scipy.signal.convolve2d's mode input argument was set to "full" instead of "same," since "same" would effectively result in clipping the filters too early.

Part 2: Fun with Frequencies

Part 2.1: Image "Sharpening"

High frequencies in an image are obtained by subtracting a blurred image (i.e. an image convolved with a Gaussian) from the original. By adding the high frequencies (or some proportion/scaled version of them; here I simply use a factor of 1) back to the original image, we obtain a sharpened image. To sharpen the given image of the Taj Mahal (taj.jpeg), I chose a kernel size = 7 with sigma = 3. I empirically found these values to be the ones that yielded a sharpened image that looked pleasantly sharp without weird artifacts.

taj.jpeg. A sharpened image is achieved by adding high frequencies back to the original.
The sharpening operation can be combined into a single convolution operator -- the "unsharp mask filter" -- by subtracting the Gaussian from a "do-nothing" impulse kernel (all zeros except a 1 at the center). The results obtained are below and are equal to those above:
taj.jpeg with a single convolution with the unsharp mask filter (impulse minus Gaussian)
I applied the unsharp mask filter to a photo that I took of the summit register on top of Mt. Shasta that happened to be blurred when I took it. The sharpening filter improves image quality and accentuates a lot of the edges that were fuzzy in the original image. The same kernel size (7) and sigma (3) I used for taj.jpeg also worked for this image:
shasta.jpeg with an applied unsharp mask filter (impulse minus Gaussian)
I applied the Gaussian blur (again with kernel size 7 and sigma 3) to an already-sharp image that I took of a bird and then applied the respective unsharp mask filter to the blurred image. Interestingly, the original image was not re-obtained. The re-sharpened image was still slightly blurred. This is because once the Gaussian blur filter was applied, the high frequency components were effectively lost. They cannot be reliably retrieved again simply by applying a sharpening filter. The subsequent sharpening filter will only find the high frequency components of the blurred image, which are not all the high frequency components from the original image, so the resulting "sharpened" image is not as sharp as the original.
sharpbird.jpeg, blurred then re-sharpened

Part 2.2: Hybrid Images

Hybrid images can be created by combining the high frequency components of one image (visible at close range) with the low frequency components of another image (high frequency components are less visible far away, so the low frequency image dominates). To create my hybrid image creation function, I used a regular 2D Gaussian as a low-pass filter and an impulse minus another 2D Gaussian as a high-pass filter. When combining images, best results were obtained when the high-pass filter was applied to the image with more high frequencies, or at least the image where the high frequencies "define" the content of that image better. In the Derek and Nutmeg example, the Nutmeg image had more important high frequency components because of all the fur, so I applied the low-pass filter to Derek and the high-pass filter to Nutmeg. After filtering, the FFT intensities for Derek are concentrated near the origin/axes and fade off as frequency increases, as expected. The filtered FFT for Nutmeg, on the other hand, is bright even on the outside areas of the plot. For the Derek and Nutmeg example, I chose to use a kernel size of 41 with sigma 21 for Derek (larger numbers led to too much blurring, and even from far away, it was difficult to distinguish Derek; smaller numbers led to too much Derek close up) and a kernel size of 31 with sigma 7 for Nutmeg.
Steps to creating a Derek and Nutmeg hybrid image
And here is the hybrid result, with a smaller version provided to simulate looking at the image from far away:
Here is a hybrid between my cat's face and a lion's face, which I found online [1]. I used a Gaussian with kernel size 41 and sigma 21 for both. My cat needed a lot of blurring to "counteract" the distraction of her high-contrast black-and-white face.
Steps to creating a cat-lion hybrid image
And here is the result:
Cat-lion
One of my favorite results is a hybrid image between one of my cats (high-pass) and a bread loaf [2] (low-pass). Below are the intermediate steps and frequency analysis via log magnitude of the Fourier transform. I used a kernel size of 9 with sigma 3 for the bread and kernel size of 15 with sigma 5 for the cat. My cat had a lot of high frequencies, and the bread loaf had a lot of low frequencies, so I think this one worked out pretty well:
Steps to creating a cat-bread hybrid image
And the result:
Best Thing Since Sliced Bread
Interestingly, when I reversed the above images, applying a high-pass filter to the bread and a low-pass to my cat, the algorithm did not yield very good results, as seen below. Even at close range, the image just looks like a blurry version of my cat with some weird unrecognizable texture overlaid on him (color doesn't help either, because both are brown). The algorithm doesn't seem to work very well when you apply a high-pass filter to an image with primarily low frequency components and/or a low-pass filter to an image with primarily high frequency components, because you lose pertinent information (e.g. for the bread, a high-pass filter leaves you with a weird over-textured lump that isn't very bread-like, in my opinion). The hybrid image is just some confusing mix of both:
An unappealing hybrid image

Bells and Whistles:

To use color to slightly enhance my results, I replaced the single 2D convolution with separate convolutions in the R, G, and B channels. I had to modify the aligh_images function slightly -- namely, the skimage.transform.rescale function would drop a color channel on some images, so I had to set the multichannel parameter to True to ensure that this would work for all images. I felt that in general, it was more effective when color was applied to the high-frequency image and not the low-frequency one. This kind of made sense to me, as cones, unlike rods, are the components sensitive to high-frequency signals and color; adding color to the high-pass image made the high-pass image more clear at close range, but far away, the color along with the high-frequency components "faded" as cones become less dominant, making the low-pass image more clear. Below are the colorized hybrid images:

Derek-Nutmeg, colorized
"Best Thing Since Sliced Bread," colorized
Cat-lion, colorized

2.3 Gaussian and Laplacian Stacks

To prepare for multi-resolution blending between two images (as described by Burn and Adelson), I first created Gaussian and Laplacian stacks for each image. I created these stacks one level at a time. The first image in a Gaussian stack is the original image. For any following image at level i in the Gaussian stack, we apply a Gaussian blur filter to the prior image (at level i-1) in the stack. This is effectively applying an increasingly large Gaussian to the original image at each level. The Laplacian stack at level i contains the contents of the (i-1) level of the Gaussian stack subtracted from the image in the Gaussian stack at level i. The end of the Laplacian stack should contain the most blurry image, so the last image in the Gaussian stack is appended to the end of the Laplacian stack.

For the oraple, I used a Gaussian with kernel size 11 and sigma 5 for both the orange and the apple. The Gaussian and Laplacian stack images at levels 0, 2, and 4 for a stack depth of 5 are below:

Apple
Orange
Using these stacks, we can reproduce the figure in Figure 3.42 in Szelski (Ed 2) page 167. Section 2.4 will subsequently discuss how a mask with a vertical seam is applied to separate out the "Apple" and "Orange" contributions. The first row is the Laplacian stack at level 0, the second is the Laplacian stack at level 2, and the third is the Laplacian stack at level 4. The fourth row contains the sum of the contents at every level of the Laplacian stack for each image.
Szelski Figure 3.42 reproduction

Part 2.4: Multiresolution Blending (a.k.a. the oraple!)

To apply multi-resolution blending to create the oraple, I first created a mask with a vertical seam down the middle:
"Step" mask

I created stacks, as described in the previous section, for the mask itself to avoid having a clear seam between the orange and the apple. Then, I created a stack to make the blended image. One image's contribution at level i is that image's Laplacian stack at level i multiplied by the mask's Gaussian stack at level i. The other image's contribution is that image's Laplacian stack at level i multiplied by the inverted version of the mask's Gaussian stack at level i (i.e. 1 minus the mask's Gaussian stack at level i). Each level i of the blended stack is the sum of these two contributions. That is,

stack_blend[i] = (1-stack_gaussian_mask[i])*stack_laplacian_image1[i] + stack_gaussian_mask[i]*stack_laplacian_image2[i]

Finally, the blended image itself is composed by adding all the levels of the blended stack (this is also seen in the lower right of the Szelski Figure 3.42 reproduction above).

Oraple

I then made a desert beach blended image. I found a picture of a desert from the California Department of Fish and Wildlife website [3] and a picture of Champagne Beach from Harper's Bazaar [4]:

Desert
Beach

A little bit of pre-processing in Photoshop/Lightroom was needed to make it so that the images had similar levels of saturation and exposure (the beach was originally over-saturated, and the desert was a little over-exposed). I manually made my mask by roughly painting over the saguaro and some desert flora in white and then filling in the rest of the image in black.

Desert Beach mask

The intermediate results in the stacks (levels 0, 2, and 4 in the first 3 rows and the added results in the last row) are shown below:

Desert Beach stacks

And finally, the result:

Desert Beach
The next image I made I call "Funny Business on the Glade." I took an image of Memorial Glade from the Daily Cal [5] and an image of a male ostrich chasing a female ostrich, with some other exotic onlookers, that I took in Tanzania:
Tanzania savannah
Memorial Glade
I created a mask by roughly painting over the animals and parts of the savannah in black and filling in the rest in white:
"Funny Business" mask

The intermediate results in the stacks (levels 0, 2, and 4 in the first 3 rows and the added results in the last row) are shown below:

"Funny Business" stacks

And finally, the result:

"Funny Business on the Glade"
This one turned out to be a little tricky, because the skylines were originally completely different, and the exposures were completely different. There was a decent amount of fiddling in Photoshop to get the two images aligned and re-colored enough to look more or less natural.

Bells and Whistles

To enable colorized hybrid images, I modified the stack-making function in my code to store colored images at each level of the stack. The colored images in the Gaussian stack comprised R, G, and B channels each separately convolved with a Gaussian; the Laplacian stack was calculated as usual (image in level i of the Gaussian stack subtracted from the image in the prior level). For displaying the intermediate results (ONLY for display; not for calculating the hybrid), I normalized the images, because the intensity levels in the Laplacian made the intermediate images extremely dark and hard to see. Below are the colorized stacks and blended images:
Colorized Oraple stacks
Colorized Oraple
Colorized Desert Beach stacks
Colorized Desert Beach
Colorized "Funny Business" stacks
Colorized "Funny Business on the Glade"

Takeaways

One interesting thing I learned in this assignment is that our eyes' sensitivity to high and low frequencies in an image varies by viewing distance, and we can take advantage of that to create a single image that looks very different depending on where you stand to look at it. It was also interesting to me to see how a simple blur filter was the first step to so many different effects, including edge detection, sharpening, hybrid image creation, and multi-resolution blending. That being said, I'd say the most important thing I learned in this assignment is that there's still a lot of human involved (at least based on what I personally know) in creating an interesting, "successful" hybrid or blended image. Even after tuning my code to work well on the example images, I couldn't just simply run the code with the same parameters blindly on whatever images I wanted to run them on. I had to select images that would be compatible, do some manual pre-processing to recolor and crop them, and then for multi-resolution blending, manually make a mask to blend the parts that I actually wanted to blend.

Image sources

[1] https://wallpaperaccess.com/lion-head
[2] https://decoratedtreats.com/homemade-white-bread-loaf.html
[3] https://wildlife.ca.gov/Data/CNDDB/News/taxon-of-the-week-the-saguaro-cactus
[4] https://www.harpersbazaar.com/culture/travel-dining/g3820/worlds-best-beaches/
[5] https://www.dailycal.org/2020/10/14/uc-berkeley-officials-plan-to-launch-program-for-in-person-instructional-activities/