CS 194-26: Project 2

In this project we undertake a journey to explore (and play) with image frequencies. We will implement the Gaussian filter and use it as our foundation for more advanced applications such as edge detection, sharpening, and image blending. Real applications of these concepts can be found in photo processing applications such as Photoshop, and in built-in features of some prosumer cameras and smartphones.

Part 1.1: Finite Difference Operator

We are interested in detecting the edges of an image. In principle, edge detection relies of detecting sudden changes of contrast or brightness from one area of the image to another. To achieve this, we use the finite difference operator to generate two derivative images with respect to the x and y axis.

Target image.

Cameraman

First, we convolve the original image with the finite difference operator in the x dimension: [1, -1], then we convolve the image with the finite difference operator in the y dimension: [[1], [-1]].

Differential and vector forms.

Finite difference operator for X axis

This reveals edges along the x-axis.

Partial Derivative in x-direction.

Differential and vector forms.

Finite difference operator for y-axis

This reveals edges along the y-axis.

Partial Derivative in y-direction

Then, we compute the gradient magnitude by using a simple adaptation of the Pythagorean Theorem where we input our two partial derivatives from the previous step.

Visual representation.

Gradients of an image

Pythagorean Theorem.

Gradient Magnitude

Finally, we binarize the gradient magnitude image by picking an appropriate threshold. This reduce noise and isolate the edges. The final edge image is shown below at the right hand side.

The edges are dim but visible.

Gradient magnitude

The edges are visible.

Gradient magnitude binarized

Part 1.2: Derivative of Gaussian (DoG) Filter

Although our previous technique for detecting edges worked at some extent, we realize that the final edge image still has some noise manifested as tiny white dots and lines. We know that edges located in areas where there are a sharp, sudden changes in contrast (high frequencies), but sometimes images can be “busy,” which mean that some image areas with lots of subtle detail can fool out naive algorithm to think they are actually edges. This grainy fuzziness can be attenuated easily by smoothing the image. One technique to smooth images is by applying a gaussian blur, which acts as a low-pass filter on the image while preserving boundaries and edges better. To generate a gaussian filter we use the formula shown below, where x is the distance from the origin in the horizontal axis, y is the distance from the origin in the vertical axis, and σ (sigma) is the standard deviation of the Gaussian distribution. When applied in two dimensions, this formula produces a surface whose contours are concentric circles with a Gaussian distribution from the center point as shown in the following diagram:

In two dimensions, the formula of a gaussian is the product of two one-dimensional Gaussian functions.

Gaussian formula and 2D representation

The values from this distribution are used to build a convolution matrix which is applied to the target image. This will generate a blurred image where each pixel's value is set to a weighted average of that pixel's neighborhood. Then, we convolve the blurred image with the finite difference operators just as in Part 1.1 to get the two partial derivatives, which then are used to get the gradient magnitude. The results of this multi-convolution process are shown in the images below:

Target image.

Original Cameraman image

Gaussian filter was applied.

Blurred Cameraman image

Edges are now much clearer, with less noise.

Gradient magnitude binarized from the blurred Cameraman image

What differences do we see compared to the previous part? As we can see, the edges are now much more pronounced and there is visibly less noise in the image than our naive approach in part 1.1. We can achieve this result with a single convolution instead of two by creating a Derivative of Gaussian (DoG) filter. To do this, we convolve the gaussian kernel we generated above with the finite difference operators D_x and D_y to get our DoG filter.

Finite difference operators D_x and D_y.

Derivative of Gaussian (DoG) Filter

Finite difference operator for x-axis.

DoG x-axis

Finite difference operator for y-axis.

DoG y-axis

Finally, we convolve our original image with our newly created Derivative of Gaussian (DoG) filter, and we see that the result is exactly the same as when we first applied the gaussian filter to the original image, while avoiding an additional convolution computation! Note that convoluting our finite difference operators is computationally cheaper than applying a gaussian blur to the target image first.

Target image.

Original Cameraman image

Edges are now much clearer, with less noise.

DoG binarized from the original Cameraman image

Part 1.3: Image Straightening

In part 1.2 we explored how the gradient magnitude can be used to detect edges. In this section now we will explore how gradient direction can be used for image straightening. Instead of computing the magnitude of the two axial gradient components, we will them to compute the angle to figure out the direction of contrast change at a given location of an image. To compute the angles, we use the following formula:

The gradient angle is defined as the direction of maximum change.

Gradient angle.

As most pictures have an horizon line (ground, waterline, etc.) and structures parallel to it like buildings, trees, people, etc. we can assume that an straightened image will have most edge directions at 0°, ± 90°, ± 180° with respect to the standard cartesian plane (where the x-axis is the floor level). To automatically straighten an image, we rotate the image and compute all the gradient directions at each angle. We then choose the angle that produces the most vertical and horizontal lines. In the example below we show some examples showing the original image, the straightened version, and a graph showing the number of perpendicular angles per angle.

Target Image. Original NYC image	Target Image rotated Roated NYC image by -10°	Horizontal and vertical edges are defined to have angles: 0°, ± 90°, ± 180° Vertical and horizontal edges per angle
Target Image. Original trees image	Target Image rotated Roated trees image by 12°	Horizontal and vertical edges are defined to have angles: 0°, ± 90°, ± 180° Vertical and horizontal edges per angle
Target Image. Original Golden Gate image	Target Image rotated Roated Golden Gate image by 4°	Horizontal and vertical edges are defined to have angles: 0°, ± 90°, ± 180° Vertical and horizontal edges per angle
Target Image. Original whale image	Target Image rotated Roated whale image by 4°	Horizontal and vertical edges are defined to have angles: 0°, ± 90°, ± 180° Vertical and horizontal edges per angle

Target Image.

Original Facade image

Target Image rotated

Roated Facade image by -3°

Horizontal and vertical edges are defined to have angles: 0°, ± 90°, ± 180°

Vertical and horizontal edges per angle

Failure case: Although this algorithm works for may images, there are some images where most lines are in diagonal directions, or the number of vertical and horizontal angles are so close that the algorithm cannot discern if a given right angle correspond to the horizontal or vertical axis. As result, some images can be rotated in the wrong direction. In the following example, the buildings and street floor produce similar number of vertical and horizontal edges in two configurations, the wrong one is shown below.

Target image.

Original Lady image

Target Image rotated.

Roated Facade image by -45°

Horizontal and vertical edges are defined to have angles: 0°, ± 90°, ± 180°

Vertical and horizontal edges per angle

We can fix this by reducing the range of angle rotations or by setting a bias towards a particular direction of rotation. In this case we provide a counterclockwise rotation bias to get the correct angle.

Target Image rotated.

Roated Facade image by 44°

Part 2.1: Image "Sharpening"

In part 1 of this project, we got the chance to get our feet wet on frequencies. Now that we know what a Gaussian filter is, we can further exploit its capabilities to develop additional image manipulation applications. In this section we explore how we can construct a high frequency filter using a gaussian filter, which then will be used to make images sharper. Image sharpening refers to the process of accentuating strong edges (high frequencies) and some small details so that they become more visible to the human eye. The fovea in our eyes is a tiny pit located in the macula of the retina where visual detail is resolved. In other words, it is the area in our eye in change of decoding high frequencies (details).

The fovea is the central region of the eye.

Areas of human retina (Physics Department, Georgia State University)

In order to enhance the details of an image, we extract its high frequencies and then we add it back by some constant factor. To get the high frequencies from an image we first get its low frequencies by applying a gaussian blur filter on the original image and then we subtract the resulting blurred image from the original image. We can combine these steps into a single convolution operation called the Unsharp Mask Filter, whose equation is shown below. The gaussian kernel is expressed as g, the original image as f, the multiplicative factor as α (alpha), and the unit impulse matrix as e.

Note: The unit impulse matrix is a matrix where the central cell is 1 and the rest is all zeros.

Unsharp Mask Filter

By setting the right size and standard deviation for our gaussian kernel, and the right multiplicative factor α (alpha), we can make any image appear sharper. Below we show a Taj Mahal image where the gaussian kernel is of size 35 and σ = 20 at different α values. We also show some additional examples.

Target image. Original image		High frequencies are boosted. Sharpened image with α = 0.1		Image starts to show undesired effects at high contrast areas. Sharpened image with α = 0.7		Image is completely overtaken by the high frequencies. Sharpened image with α = 1.5
Target image. Original image		High frequencies are boosted. Sharpened image ksize = 35, σ = 3, α = 0.7		Target image. Original image		High frequencies are boosted. Sharpened image ksize = 10, σ = 1.5, α = 2.3

It is important to note this process is not a true sharpening technique, as no new information is provided. Unlike image stacking techniques frequently used by astronomers or astrophotographers where new data is provided by additional images to enhance picture quality, our algorithm only enhance the available high frequencies of the images, which may not produce the desired result, and might even increase noise. To show this, we blurred an otherwise sharp image and we use it as input to our algorithm to make it sharper. The result, as you can see below, is not very appealing when compared to the original image.

Target Image.

Original image

Gaussian blue applied to target image

Blurred image
ksize = 20, σ = 1.9

The result is not very appealing. This is normal due to the lack of new information.

Sharpened image from blurred version
ksize = 20, σ = 1.1, α = 6.5

Part 2.2: Hybrid Images

Now that we know how to extract low and high frequencies from images, we can have a little fun! In this section we will create hybrid images, some of which appear to morph when seeing it at different distances. The process to create these images is quite straightforward, with the challenge being finding images that would match well together and finding their right gaussian kernel size and standard deviation σ (sigma). The examples below show how we can make morph between different subjects, and some examples even appear to be different from distance.

We extract the low frequency from this image. Dereck		Hybrid Image Hybrid Image ksize = 30, Lσ = 4.55, Hσ = 13.66		We extract the high frequency from this image. Nutmeg the cat
We extract the low frequency from this image. Dog		Hybrid Image Cougar-like hybrid Image ksize = 20, Lσ = 12, Hσ = 17		We extract the high frequency from this image. Nutmeg the cat

Bells & whistles:

In order to improve the morphing effect, we create colorful hybrid images where we retain the color of the image whose low-frequency component is desired. The high frequency from the second image remains the same regardless of whether the image is colored or in grayscale. For the following image, we illustrate the process through frequency analysis. We show the log magnitude of the Fourier transform of the two input images, the filtered images, and the hybrid image.

Target image 1. Tiger		Log magnitude of the Fourier transform for this image. Fourier transform for Tiger		Target image 2. Wolf		Log magnitude of the Fourier transform for this image. Fourier transform for Wolf
The color for tiger was preserved. Tiger's low frequencies ksize = 30, Lσ = 2.55		Log magnitude of the Fourier transform for this image. Fourier transform for Tiger's low frequencies		The edges for wolf were preserved. Wolf's high frequencies ksize = 30, Hσ = 13.66		Log magnitude of the Fourier transform for this image. Fourier transform for Wolf's high frequencies

Final result

Morph between a tiger and a wolf.

Tiger-Wolf Hybrid

Log magnitude of the Fourier transform for this image.

Fourier transform for Tiger-Wolf Hybrid

Now, we present some interesting face morphing images which hopefully appear different depending the viewing distance.

We extract the low frequency from this image. Barack Obama	Hybrid Image Barack Putin ksize = 30, Lσ = 4.5, Hσ = 19	We extract the high frequency from this image. Vladimir Putin
We extract the low frequency from this image. Donald Trump	Hybrid Image Vladimir Trump ksize = 30, Lσ = 4.25, Hσ = 3.66	We extract the high frequency from this image. Vladimir Putin
We extract the low frequency from this image. Pieter Abbeel	Hybrid Image Pieter Sahai ksize = 30, Lσ = 3.55, Hσ = 8.66	We extract the high frequency from this image. Anant Sahai

We can simulate a “farther away” view by scaling down the hybrid images. We can observe that the low frequencies are visible while the low frequencies are difficult to discern.

Dereck Nutmeg

Dog Nutmeg

Barack Putin

Vladimir Trump

Pieter Sahai

Failure case: If the images to be blended do not share a pretty similar composition, the results won’t be as pleasing as the ones shown above. Blending incompatible images might produce unpleasant and even disturbing results! An example of this is shown below.

We extract the low frequency from this image.

Josh Hug

Hybrid Image

Josh Hilfinger
ksize = 30, Lσ = 2.55, Hσ = 13.6

We extract the high frequency from this image.

Paul Hilfinger

Part 2.3: Gaussian and Laplacian Stacks

Before we can continue exploring more frequency manipulation applications, we need to build two additional tools that will allow us implementing yet another cool blending application. These new tools are called Gaussian and Laplacian stacks. In a stack the images are never downsampled as in pyramids, so the results are all the same dimension as the original image. To create a Gaussian stack, we just apply the Gaussian filter at each level, so each new layer becomes blurrier (we remove the high frequencies). To create a Laplacian stack, we do follow the same process as for a Gaussian stack, but instead of removing the high frequencies we remove the low frequencies, so all that remains are just the sharp edges of the image. A nice direct application of these stacks is to split the low and high frequencies components from a given image. We show some examples for some hybrid images, including some that we created in the previous part.

Hybrid image Lincoln in Dalivision by Salvador Dalí		Stack of low frequencies Gaussian Stack
Hybrid image Lincoln in Dalivision by Salvador Dalí		Stack of high frequencies. Laplacian Stack
Hybrid image Dog Nutmeg		Stack of low frequencies Gaussian Stack
Hybrid image Dog Nutmeg		Stack of high frequencies. Laplacian Stack
Hybrid image Vladimir Trump		Stack of low frequencies Gaussian Stack
Hybrid image Vladimir Trump		Stack of high frequencies. Laplacian Stack
Hybrid image Barack Putin		Stack of low frequencies Gaussian Stack
Hybrid image Barack Putin		Stack of high frequencies. Laplacian Stack

Part 2.4: Multiresolution Blending

Finally, we will use our Gaussian and Laplacian stacks to build the Multiresolution Blending Algorithm. The main use of this algorithm is to blend two images seamlessly. In this section, we implement a multiresolution spline technique to combine two or more images into a composed image. This process requires that we first decompose our input images into a set of band-pass filtered component images (Gaussian and Laplacian stacks!). This algorithm can be outlines as follows:

Build Laplacian stacks LA and LB for images A and B respectively.
Build a Gaussian stacks GR for the region (mask) image R.
Form a combined pyramid LS from LA and LB using nodes of GR as weights. That is, for each l, i and j:
Obtain the splined image S by expanding and summing the levels of LS.

Now we are ready to have some fun again by creating some image blendings! The results are similar to the one we could have obtained from photo edition software such as Photoshop. After all, they may have their basis in some of these core concepts of image manipulation.

Hybrid image Tree 1		Stack of low frequencies Gaussian Stack
Hybrid image Tree 1		Stack of high frequencies. Laplacian Stack
Hybrid image Tree 2		Stack of low frequencies Gaussian Stack
Hybrid image Tree 2		Stack of high frequencies. Laplacian Stack
Hybrid image Blended trees		Stack of low frequencies Gaussian Stack
Hybrid image Blended trees		Stack of high frequencies. Laplacian Stack

Bells & whistles:

We present some additional examples using color to enhance the effect below.

First image. Apple	Second image, Orange	White corresponde to first image, black correspond to second image. Mask	Blended image Apple + Orange = Oraple!
First image. Shark	Second image, Fur	White corresponde to first image, black correspond to second image. Mask	Blended image Furry Shark
First image. Shark	Second image, Scales	White corresponde to first image, black correspond to second image. Mask	Blended image Scaly Shark

Final thoughts

In this project, we surveyed different image manipulation applications that use convolutions with gaussian kernels and other frequency-extraction operations to generate interesting imagery. We demonstrated that images can store plenty of information that can help us to manipulate them in various useful and creative ways. We also learned how our eye plays a role in how we perceive the world around us. Working on the hybrid/morphing image generator was particularly fun since these type of images provide a really interesting visual illusion that depends on the viewing distance. Overall, this project was enriching, challenging, and fun!