Experiments with filters and frequencies for image processing

This project is a walkthrough of some forays into image straightening, image sharpening, building hybrid images, and image blending, utilizing fundamental filters and image operations.

Jazz Singh // September 2020

1. Image Straightening

A common issue, especially when taking photos with a smartphone, is that the captured images can end up unintentionally slightly rotated. The goal of this section is to find an effective way to straighten these images.

One way to accomplish this is to notice that because of gravity, there are more horizontal and vertical edges in straightened images than slightly rotated images (though this assumption fails for images with mostly curved edges). So, if there was some way to figure out the orientation of edges in the original image, it would be possible to rotate the image until a lot more edges are oriented horizontally and vertically than otherwise, indicating the image is straight. This process is a bare-bones outline of the algorithm I will utilize for image straightening.

1.1. Detecting Edges

Executing on the above, I used two simple filters to detect edges in the x and y directions: the finite difference operators [1 -1] and [1 -1] transposed. These filters approximate derivatives in the discrete pixel setting, so they can help find the regions of highest change within the image, which correspond to edges.

Convolving an image with each of these filters yields two partial derivative images. I took the L2 norm over these to display gradient magnitudes; high gradient magnitudes should correspond to edges, so I binarized these magnitudes with a suitable threshold (removing as much noise as possible, like messy particles in the ocean, in the process). The below images illustrate each of these steps.

1.2. Reducing Noise in Edge Detections with a Low-Pass Filter

Unfortunately, the thresholding step removed lots of real edges, because regions of high gradient magnitude weren't only caused by real edges, but also by considerable noise. I resolved this issue by low-pass filtering the original image before detecting edges, to get rid of a lot of the high-frequency noise.

I used Gaussian blur for low-pass filtering; after some hyperparameter tuning, I found a standard deviation of 2 and kernel size of 12 (by 12) worked best for me. For efficiency, I constructed a filter that accomplishes both Gaussian blurring and edge detections by taking the derivative of the Gaussian filter, so that one convolution operation does both steps.

Below is the result of convolving the image with this filter, finding the gradient magnitudes, and binarizing them with a suitable threshold (similarly to 1.1). Notice, many more of the real edges are preserved -- including some edges of the background buildings. The edges are indeed thicker due to the Gaussian blur step, but this doesn't seem to cause problems for the purposes of image straightening in the images I sampled.

1.3. Straightening Images (Using Gradient Angles)

Leveraging the above details, here's a more detailed image straightening algorithm. First, I formulate a reasonable window of proposed rotation angles. For each proposal angle:

rotate the image by it (keep the image shape constant);
crop out noise on the sides of the image caused by the rotation;
apply a low-pass filter (Gaussian blur) to help with noise of the kind seen in (1.1);
detect edges with respect to x and y in the blurred image using the finite difference operators;
use the partials with respect to x and y to find the orientation of the gradient (take an inverse tangent of the quotient);
compute the orientation of each of the edges by taking the inverse tangent of the quotient of dy and dx;
and count up the number of horizontal and vertical edges (i.e. edges with angle in radians close to 0, pi/2, or -pi/2).

Finally, pick the angle yielding the highest number of horizontal and vertical edges. I rotated the original image by this obtained best angle to straighten it, then I cropped a bit off the edges in a post-processing step.

Below are results of this image straightening algorithm on 4 different images. The last example is a failure case, to illustrate an instance where this method can be improved. I've shown the original image, a histogram of gradient orientation angles, and the straightened image.

Original

Person in City
Straightened

Person in City
Straightened Angle Histogram

Person in City
Original

Facade
Straightened

Facade
Straightened Angle Histogram

Facade
Original

Leaning
Straightened

Leaning
Straightened Angle Histogram

Leaning
Original

Skyline
Straightened

Skyline
Straightened Angle Histogram

Skyline

2. Image Sharpening

The goal of this section is to use existing information in images to "sharpen" them. I do so by subtracting the output of a low-pass filter from the original image, multiplying this result by a hyperparameter, and adding it to the original image. I essentially "emphasize" the highest frequencies in the image.

I used a Gaussian filter with standard deviation 2 and size 12 (same as in section 1) as the low-pass filter, and after experimentation I set hyperparameter alpha to 0.5. To make the process more efficient, I combined the operations into a single unsharp mask filter. I sharpened each RGB color channel separately and concatenated them into a sharpened image.

Below are the results on some images:

Original

Lena
"Sharpened"

Lena

Original

Taj
"Sharpened"

Taj

Original

Sari
"Sharpened"

Sari

It's important to note that no new information is being added to the image here. To illustrate this more clearly, this is what happens after applying "sharpening" to a blurred image:

Original

Blurred Lena
"Sharpened"

Blurred Lena

3. Hybrid Images

The goal of this section is to form an optical illusion from two images -- if you look at the image from a close distance, it should look like one thing, but if you look at it from afar (or alternatively squint your eyes), it should look like something else. An example of the kind of effect desired is this painting by Salvador Dali.

In the first part of this section, I describe how I implemented image hybridization; in the second, I use the Fourier domain and Gaussian and Laplacian stacks to help illustrate how the method works.

3.1. Building the Hybrid Images

The essential intuition is the following: the image should contain low frequencies of the first image high frequencies of the second overlaid seamlessly, causing different interpretations of the image at difference distances.

To execute on this, I aligned the two images with two reference points, low-pass filtered one image, high-pass filtered the other, and combined them with a weighted average. For the low-pass filter, I used a Gaussian filter, and for the high-pass filter, I subtracted the Gaussian-filtered image from the original image. The standard deviations of the Gaussian filters and the weights for the weighted average are tunable hyperparameters; most of the time, a standard deviation of 4 worked best for the Gaussian used to construct the low-pass image, and 8 worked best for the high-pass image's Gaussian. Notice there's some overlap in the frequency ranges for each of the images as a result, but this small overlap appeared to lead to the best results for the images I examined.

Here are the results of each stage of this process for a pair of images of the same person, one where he's smiling, and one where he has a neutral expression. If you look at the result from a relatively close distance, his expression seems fairly neutral; but as you go farther away from the screen (or alternatively squint), his expression seems happier. Note that I aligned the images as a pre-processing step, and I cropped the result as a post-processing step.

Input #1 (aligned)

Guy Neutral
Input #2 (aligned)

Guy Smiling

Low-Pass Image
High-Pass Image

Result (cropped)

Here's one more sample result. As you go closer to the screen, the older man's expression feels more and more awkward and pained. As you go farther (or alternatively, if you squint a lot), his expression seems more like that of a happy normal older man.

Input #1 (aligned)

Normal Man
Input #2 (aligned)

Harold

Result (cropped)

This final example is a failure case of the hybridization algorithm. Although some of the effect still works, it's not perfect especially due to the difficulty in aligning the images -- the facial structures (e.g. distance between eyes, shape of ears, etc.) of cats and dogs are too different.

Input #1 (aligned)

Kitten
Input #2 (aligned)

Doggo

Result (cropped)

3.2. Analyzing the Hybrid Images

Below, I illustrate the hybrid images in two different ways.

Firstly, I included frequencies in the Fourier domain of each step in the Guy Neutral / Guy Smiling example.

Input #1 (Aligned) Fourier

Guy Neutral
Input #2 (Aligned) Fourier

Guy Smiling

Low-Pass Fourier

(of Guy Neutral)
High-Pass Fourier

(of Guy Smiling)

Result (Cropped) Fourier

Secondly, I used Laplacian and Gaussian stacks to illustrate even more clearly how the hybridization process works. To build the Gaussian stack, I blurred (Gaussian filter) the image over and over for a predefined number of levels, progressively removing more and more of the higher frequencies. To build the Laplacian stack, I took the differences of adjacent images in the corresponding Gaussian stack, ultimately separating the original image out into the bands of frequencies that compose it.

Here are the Gaussian and Laplacian stacks for the Salvador Dali painting linked above. Note that I used 10 levels for all my examples, but I only display 4 of them (the 0th, 3rd, 6th, and 9th images) for brevity.

Below are the Gaussian and Laplacian stacks for the Guy Neutral / Guy Smiling hybrid image.

Gaussian Stack, #0

Hybrid Guy Neutral / Smiling
Gaussian Stack, #3

Hybrid Guy Neutral / Smiling
Gaussian Stack, #6

Hybrid Guy Neutral / Smiling
Gaussian Stack, #9

Hybrid Guy Neutral / Smiling

4. Multiresolution Blending

The goal of this section is to blend images together, in a way that's literally as seamless as possible.

To implement this, I used a modified form of the algorithm for image blending with arbitrary masks described in this paper. For each frequency band in the two images, I interpolated between the images, with weights defined by a blurred mask (the mask basically corresponds to the region where I want to shift from one image to the other). More concretely, I constructed a mask array, a mask Gaussian stack, two Laplacian stacks (one for each of the two input images), and two Gaussian stacks (one for each of the two input images); for each level, I took an average of the Laplacian images at that level weighted by the corresponding level of the mask Gaussian stack, making sure to take into account the last image in the Gaussian stack of each image. Finally, I expanded and combined all the levels to generate a blended image.

Some implementation details: after some experimentation, I ended up using standard deviations of 2 for Gaussians in the Laplacian stacks and Gaussian stacks corresponding to the images, and I ended up using a standard deviation of 10 for the Gaussian filter enacting Gaussian blur on the mask. Note that I aligned and cropped both images as a pre-processing step.

Below, I've blended a female and a male image to change the haircut of the male. I used a rectangular mask that preserves as much of the female's straight hair and hairline as possible, while preserving most of the male's facial features. Though there are a few visual artifacts around the male's eyebrows, the haircut overall looks surprisingly natural for such a simple technique.

Image #1

Female
Image #2

Male
Result

Female/Male Blended

To illustrate the above example a bit more, below is a visualization of how masked images at two levels of the Laplacian stacks form the combined image at that level.

Masked Image #1, Level #0

Female
Masked Image #2, Level #0

Male
Combined Image, Level #0

Female / Male Blend

Masked Image #1, Level #5

Female
Masked Image #2, Level #5

Male
Combined Image, Level #5

Female / Male Blend

Here's one more example of a multiresolution image blend (I used an elliptical mask for this one).

Image #1 (Translated)

Person
Image #2

Palm
Result

Eye / Palm Blended