CS194-26 Project 3 Writeup

Project Overview

This project was composed of two parts, generally aimed at exploring frequency and gradient based approaches to image manipulation/blending, respectively. I detail the theoretical musings, approaches taken, and results for various experiments in these domains below.

Part 1: Frequency Domain

Warmup 1.1

About:

I sharpened a pre-chosen blurry image by emphasizig the higher frequencies present in the photo.

Approach:

To get the high frequencies, I convolved the original image with a gaussian filter of tunable standard deviation. The result represented the low frequencies present in the image, so I subtracted the result from the original image to get a representation of the high frequencies. I then added this highpass multiplied by a tunable parameter alpha to the original image, and then normalized to get a sharpened image.

Results:

The sharpening results on this image of a blurry man used:

A discrete standard deviation of 12 for the gaussian filter
A value of 5 for alpha

Discussion / Failure Cases:

Notice that the sharpening is quite good for this image! I tried the technique on other images, that were not as initially blurry, and the sharpening was not as apparent. This seems appropriate as if there are few high frequencies to begin with, emphasizing them will be more striking to the eye.

Bells and Whistles:

None.

Hybrid Images 1.2

About:

I generated several hybrid images, by combining the lowpass frequencies of one image with the highpass frequencies of another. Since edges and high frequencies dominate human perception when they are perceivable, the hybrid appears most similar to the image that was high-pass filtered when seen at close range. However, at a distance, these fine details are difficult to see so the overall schema represents the image that was low-pass filtered more accurately.

Approach:

Some creative license was necessary in finding smarty pictures to hybridize. In addition, although the staff provided alignment code, I usually did some rough preprocessing so that the images to be hybridized were not too differently sized or oriented. Generally, the pictures with more fine details were more successful if incorporated as the high-pass component. The lowpass frequencies were extracted by convolution with a Gaussian filter of tunable standard deviation. The highpass frequencies were extraxted by convolution with a filter formed by subtracting a Gaussian filter of tunable standard deviation from the unit impulse filter. Finally, the two images were combined in proportions alpha (for the lowpass component) and beta (for the highpass component), and normalized to a final image. For one particularly pleasing hybridization, I visualized the Fourier domains of the input images, the filtered images, and the hybrid image. I explain the results below.

Results:

The parameters used for the sample, (globe, orange), (afro, broccoli), and (walnut, man) hybridizations, were respectively:

Discrete standard deviations for the lowpass Gaussian filter: 15.0, 3.1, 5.0, 1.5
Discrete standard deviation for the highpass-related Gaussian filter: 10.0, 1.6, 2.8, 1.8
The lowpass factor alpha contributing to the final image: 1.0, 0.5, 0.5, 0.2
The highpass factor beta contributing to the final image: 1.5, 1.7, 1.6, 1.8

Sample hybrid:

Hybrid of globe (lowpass) and orange (highpass):

Hybrid of afro (lowpass) and broccoli (highpass):

Hybrid of walnut (lowpass) and man (highpass); this hybrid image had only partial success:

For the afro and broccoli hybrid, we illuminate the approach with snapshots of the Fourier domains of the different images encountered along the procedure:

Original afro image:

Original broccoli image:

Lowpass filtered afro image:

Highpass filtered broccoli image:

Hybrid image:

Discussion / Failure Cases:

For a failure case, note that the (walnut, man) hybrid image is a significantly poorer example of the hybrid dual nature. The main reasons for this is that the lowpass image, a walnut, is not very recognizable when blurred, and the two images have very disjoint features, so that closeup the hybrid image is more perceived as a combination of both images, rather than being dominated by the highpass image. This highlights the importance of choosing clever pictures for the images to be filtered. It makes sense that images with finer details see more success when chosen as the highpass component; most of their recognizable cues, in fact, are in the high frequency regime, so if we do not capture these it is hard to recognize the image. In analyzing the Fourier domain snapshots, we clearly see that the original images have a smattering along various different regions of the frequency domain. The lowpass Fourier domain shows that most of the high frequencies (far from the origin) have been culled, and the highpass Fourier domain shows that a region of low frequencies (near the origin) have been reduced. The overall hybrid image has a sensible Fourier domain that is a combination of these filtered Fourier transforms. Some small snippets to note; the transforms generally show intense peaks around the horizontal or vertical orientations, which makes sense as these are natural orientations that show up in most human-imaged photos. Another small thing I found was that in one run (although it is not seen here) the Fourier domain of the hybrid image had an intense vertical stripe, which likely corresponded to a sharp image boundary artifact that was left in the image after running the alignment script.

Bells and Whistles:

Inclusion of color:

I included color in the highpass component of each of these hybrid images, which I found to be the most succesful scheme for profound hybrid images, when introducing color. This is due to the fact that our color-seeing cones are located in our central vision, which is less astute at long distances. Thus, the color emphasizes the highpass even more at close range, but fades out with distance (as we want).

Gaussian and Laplacian Stacks 1.3

About:

I implemented Gaussian and Laplacian stacks, and applied them on images, one of which was the (afro, broccoli) hybrid image from the last section. The Gaussian-stack filtered images progressively removed more and more high frequencies, blurring the image. The Laplacian-stack filtered images captured the differences between these Gaussian-stack levels, essentially capturing different bandpass frequencies of the images. Thus, the Laplacian-stack images revealed different structures dominating different frequency regimes, within the images.

Approach:

The stacks acting on a certain image "emulated" the downsampling seen in image pyramids by progressively increasing the filter standard deviations at each level, by a factor of 2. The Gaussian stack used a simple Gaussian filter with tunable initial standard deviation. The Laplacian stack similarly involved a Gaussian filter of tunable initial standard deviation, but each image layer was found by taking the difference between two Gaussian filtered images with consecutively increasing standard deviation (the inital image was simply the original).

Results:

The stacks were parameterized by:

Both having 8 levels
Both having discrete standard deviations of 1.8 for the associated gaussian filters

Salvador Dali painting (Gaussian Stack):

Salvador Dali painting (Laplacian Stack):

Pablo Picasso painting (Gaussian Stack):

Pablo Picasso painting (Laplacian Stack):

Afro and broccoli hybrid (Gaussian Stack):

Afro and broccoli hybrid (Laplacian Stack):

Discussion / Failure Cases:

Note how the deeper Gaussian stack image layers are progressively blurrier, as expected. The Laplacian stack images capture the full range of the image under different frequency regimes; the Dali painting is especially illuminating for this separation of frequency emphasis. The hybrid image we found earlier is also quite pleasing, as the high pass image dominates the earlier Laplacian stack layers, and the low pass image dominates the final Laplacian stack layers, as expected.

Bells and Whistles:

None.

Multiresolution Blending 1.4

About:

I utilized the Gaussian and Laplacian stacks generated in the last section to convincingly blend images. I decomposed two images to be blended into their bandpass components (via application of the Laplacian stack layers), and then blurred them (with lower frequency components blurred more, as they have structure more spread out in the spatial domain) by convolving them with a binary (or irregular) mask blurred by the corresponding layer in the Gaussian stack. By combinining all these different blurred bandpass components, I obtained final images that looked convincingly blended.

Approach:

I applied the procedure on three image pairs: orange and apple (binary vertical mask), Obama and Trump (binary vertical mask), and Efros with his own mouth blended into his eye sockets (irregular mask). The Gaussian and Laplacian stacks had a tunable number of layers, and associated standard deviations. I first applied the Laplacian stack to the images to separate each image into its bandpass components. I then applied the Gaussian stack to my choice of mask, to achieve the corresponding amount of blurring for a certain bandpass component, as explained earlier. I then applied the blurred masks to their corresponding bandpass components, and combined all the masked components together to achieve a final blended image.

Results:

The parameters used for the (orange, apple), (Obama, Trump), and (Efros' mouth, Efros) blends, were respectively:

Discrete standard deviation values for both the Gaussian and Laplacian stacks: 7.2, 13.5, 8.0
All blends used stacks both containing 11 layers

Orange and Apple blend:

Obama and Trump blend:

Efros' mouth and Efros blend:

We illustrate this process by displaying some of the Laplacian stack image layers on the blended and masked input images of the (Obama, Trump) blend:

Discussion / Failure Cases:

Note how from the visualization, we see that different bandpass components get blended to different amounts, letting all frequency components get the proper amount of blending (as frequency is related to feature size). One thing to note is the blend involving Efros' mouth and Efros; the eyes have a much brighter color than their surroundings, which is unfortunately caused by the fact that the brightness intensity of the background Efros image globally changed after processing. His glasses also pose a challenge to the blending. None theless, it can be seen that features like our professor's wrinkles blend quite nicely at the boundary, so the procedure was in fact succesful.

Bells and Whistles:

Inclusion of color:

I included color in all of the blended images.

Part 2: Gradient Domain Fusion

Toy Problem 2.1

About:

I explored the idea of blending regions with a gradient domain scheme by first trying to reconstruct a simple grayscale image. In this case, the region was the entire image, so the problem consisted of trying to maintain the gradients of the original image in the final image, while also satisfying a "boundary constraint" of keeping one of the pixels of the original image the same as in the final image. In theory, these two constraints are enough to reconstruct the image.

Approach:

I treated the problem as solving a linear system, where the variables being solved for were the pixels of the desired final image. The setup for the system was based off the two constraints stated above. The constraint aimed at maintaining the original gradients contributed equations that set the discrete gradients in both the x and y directions in the final image to be equal to those in the original image. The "boundary" constraint aimed at achieving the proper overall intensity contributed a single equation ensuring that one arbitrary pixel (the top left one) was the same in both the original and final images. The linear system was solved via a sparse least-squares solver (the system was over-determined by construction). The result properly reconstructed the original image.

Results:

Original image:

Blended image reconstruction:

Discussion / Failure Cases:

As expected, the linear system solution is a full reconstruction of the original image.

Bells and Whistles:

None.

Poisson Blending 2.2

About:

Using the derivations presented in class, I blended a source region onto a target region. The blending derivations hinged upon two constraints; maintaining the discrete gradients (represented by a convolution with the Laplacian filter) of the source image region in the blended image and a "boundary condition" stating that pixels not in the region should be the same as they were in the original target image. By crystallizing these constraints into a linear system, I was able to solve for the pixels in a properly gradient domain blended image.

Approach:

I used the starter code for generating masks to generate the proper region masks for blending, as well as the proper source regions. For each color channel, I constructed a linear system representing the constraints. For pixels not in the region, I included matrix entries representing an equation setting the final pixel values to the target image pixel values. For pixels within the region, I used the Laplacian filter (from the least-squares solution derived in class) to represent the discrete gradient, adding entries to the matrix that represented setting the final gradients in the image equal to the source gradients in the image. I solved the linear system with a sparse least-squares solver, and after recombining the solutions from all three color channels, I obtained a properly Poisson blended image.

Results:

Broccoli on Penguin Blend (original broccoli):

Broccoli on Penguin Blend (original penguin):

Broccoli on Penguin Blend (unblended):

Broccoli on Penguin Blend (poisson blended):

Trump Hair on Obama Face Blend:

Walnut on Trump Nose Blend:

Orange and Apple Blend from part 1 (Pyramid blending):

Orange and Apple Blend from part 1 (Poisson blending):

Discussion / Failure Cases:

For the most part, the blends came out very pleasing! Note that for the Trump hair on Obama blend, there is some very strange discoloration; the cause of this was that the original source image was taken from inside Trump's flowy locks, rather than taking his hair as a whole. Thus, the border gradients aimed to keep the transition from the border into the hair relatively constant. So instead of the blended image looking like Trump's hair as a goatee on Obama, we instead have Obama's skin fraying out into what looks like a beard. Oops! Note also that we have included the blend with orange and apple again, from part 1! It appears that both Poisson and pyramid blending are good at blending the seam between the images, but Poisson blending gets nasty artifacts near the boundary if the region is not specified extremely carefully, or if the boundary region background is different colors in the original images. Mixed blending

Bells and Whistles:

Mixed Gradients Blending:

I tweaked the original Poisson blending linear system that I solved to satisfy a slightly different constraint. Rather than have the final gradients in the masked region be equal to the source gradients in that region, I enforced that the final gradients be the maximum between the masked region's source image gradients versus target image gradients. This change improves the preservation of certain features in the final masked region, because certain features in the target region may be undesirably hidden by nature of Poisson blending's constraints. Below I show the use of this technique to blend an eye onto the palm of a hand. Although there are some certain discoloration artifacts (we can also run this technique in grayscale to ameliorate these effects), we can see that certain features from the target image are desirably preserved in the masked region of the final image. Although it is a little bit difficult to see, some of the veins from the palm in the target image are preserved and overshadow the featureless regions in the source image of the eye. This overshadowing of featureless regions in the source image is the primary boon from using mixed gradients blending.

Texture flattening:

I tweaked the linear system set up in the toy problem scheme, by zeroing out gradients under a tunable threshold. The result is that like the toy problem, the image is reconstructed, but with fine details (associated with small gradients) eliminated. The resulting "matte" look is what we dub texture flattening.

None.

Final thoughts:

I found out from this project that even though you can have a functioning algorithm, you still need significant human intervention: tuning hyperparameters, cleverly picking images, and aligning/cutting images before processing. We are, in fact, trying to make images meant to be perceived by humans!