CS194-26 Final Projects

Project Overviews

I completed two projects for this final report. The first project, seam carving, selectively chooses "seams" of pixels within images, to resize an image (either shrink or grow the image) in a content-aware fashion. The second project, vertigo shot, emulated an unsettling cinematic effect called the "dolly zoom". This effect is achieved with careful combination of camera zoom and distance from a target, such that the background appears to zoom in at the viewer without any other scenic chances.

Project 1: Seam Carving

About:

In this project, I built a program to resize and image through succesive sets of vertical/horizonal seam carves, while still remaining aware of the situated content within the image. This approach at its core is an automated and robust, bottom-up approach, that avoids detecting regions of visual importance directly, but rather processes a map of importance throughout an input image. A pixel seam is a set of pixels with single-pixel width spanning the image either horizontally or vertically, with each pixel being adjacent either translationally or diagonally from the previous pixel in the seam. Intuitively, this gives the seam flexibility to "bend" around objects or regions of important content, while still maintaining a sense of spatial importance. The seams were calculated to minimize their cut of image importance, based upon an energy function applied to the image (in our case, the sum of the vertical and horizontal discrete grayscale gradients). To add seams, I followed the approach laid out in the paper I followed (Seam Carving for Content-Aware Image Resizing by Shai Avidan and Ariel Shamir) where I found the cuts that would be made if I wished to shrink the image, and instead inserted neighbor-averaged pixels in those seams. Our approach and implementation details are laid out in further detail below.

Approach:

For our seam carving application, an obviously appealing use case would be to observe or animate the calculated seam cuts/insertions quickly. As a result, I structured the code such that I computed the seams beforehand, to be saved for an quick and efficient on-line section where I actually resized the image based upon those calculated seams.

To quantify content-importance within the image, I calculated an energy function map on the grayscale image, which in my case was the sum of the horizontal and vertical discrete gradients. Intuitively, regions where this metric is high correspond to regions with many edges, which likely encapsulate more importance.

Given these regions, then, I calculated the seam that cut through the minimum energy. I used a dynamic programming approach to calculate this minimum seam, where the dynamic subproblems tallied the weight of an in-progress seam starting from the top (or left) of an image. Thus, for each subproblem, I calculated its appropriate weight by summing the in-place energy with the minimum weight among its adjacent parent subproblems. After calculating all subproblems, I selected the the one at the bottom (or right) of the image, with the minimum weight, and then backtracked along minimum-weight parents to obtain the overall minimum-weight seam.

For image shrinking, I simply removed these chunks of successive vertical/horizontal seams, in order. For image enlarging, I inserted new pixels along these calculated seams, averaging among neighboring side pixels. Doing this insertion after calculating all seams in the batch was crucial, to avoid inserting at and recalculating the same lowest-energy seam, repeatedly. I detail the results of shrinking and enlarging images with different characteristics of embedded content, as well as some failure cases, below.

Results:

The images below are some examples of images, original and after shrinking with seam carving. Notice that for the most part, these examples are very good at preserving the image content, while impressively compressing the image size!






The original and after images below implement enlarging with seam carving. Although the image is a bit more spotty and has more artifacts than the shrunken ones, it impressively preserves the image content!

Finally, here are some examples of images that did not work so well.





Discussion / Failure Cases:

The resizing approach is very effective for most images! The most important thing that determines this approach's success is the nature of the content itself in each input image. As can be seen in the example above with dense human faces, shrinking an image too much leads to sudden and catastrophic artifacts. How much is too much is gauged by the image content itself; however much can be ejected without ruining the embedded content. There is not much we can do about this! This is simply a limitation of this approach, by construction. Another important failure case involves textures. Looking above at the shrunken giraffe example, it is evident that certain textures may interact unfavorable with our choice of energy function; the fair portions of the giraffes neck are chosen to be cut in disproportionate fashion, although from a human perspective they are just as important, content-wise, as the rest of the neck. This results in some strange zigzag artifacts. Another example is the image with food laid out on a textured table. Although from a human perspective, the table carries little importance and should be cut, our gradient-based energy function assigns a high importance to this region where there are rapid changes between dark and light hues. Perhaps for our texture mishaps could have been addressed by a different choice of energy function, which would help flag such regions for removal. Some final discussions on failure cases involve seam insertion rather, than shrinking. Even for cases where seam insertion is fairly successful, we spot some pretty heady artifacts. There is not much we can do about this, since our approach involving neighbor-averaging is blind to actual image content. In addition, some other troubles can arise from entering the extrapolating realm of seam insertion. As can be seen in the skyscraper insertion example, seam insertion has no sense of perspective in the image, and as a result may not output perspective-aware extensions. Overall, a large portion of this project involved manually discovering the weak points and little tricks to help tune and pre-process images to be best suited towards this approach.

Bells and Whistles:

As depicted in the approach, results, and discussion, multiple bells and whistles were covered in this project:

Part 2: Vertigo Shot

About:

In this project, I tried to simulate the "dolly zoom" effect seen in cinematography. The dolly zoom combines camera motion and zoom in such a fashion that the foreground focus appears to stay still, while the background zooms in. I found that for this project, choice of scene (presence of any obstructions or perspective-inducing elements) was critical for success.

Approach:

I used a Nikon D3300 fitted with a 17-50mm Sigma Lens and zoom functionality. I asked a friend to shoot a rapid series of photographs, where she walked away from me while zooming in on the camera, trying to keep my apparent size in the shoots constant. Due to the zoom and the distance of the background, the background appeared to zoom in, resulting in the desired "dolly zoom" effect.

Results:

Sequence 1 Images:

Sequence 1 Animated GIF:

Sequence 2 Images:

Sequence 2 Animated GIF:

Sequence 3 Images:

Sequence 3 Animated GIF:

Discussion / Failure Cases:

Choice of scene was critical here. Note how in the sequence in the alleyway, the walls on the sides give away what's actually happening behind the scenes, making it cognitively hard to convince the viewer of the dolly zoom. Noting this issue, I made sure to cut out any elements on the side that could induce perspective information in the next sequence shoot (which is the one against the wall). However, the fact that the background was so close caused some issues in getting a quality clip. So for the final sequence (with the poles in the background), I made sure to also move the background further away. Overall, we obtain a very unsettling (nice!) dolly zoom effect!

Bells and Whistles:

As depicted in the results, one bell and whistle was covered in this project:

Final project summary

In this project, the most valuable takeaway was learning how to emulate approaches I read about from existing literature, as well as tweaking them based upon my final results. In both cases, I learned how important the specific scene/input images affect the overall quality of the results. For example, in the first project, tailoring inputs with specific characteristics to a specialized version of the approach resulted in much better results. In the second project, choosing how to stand and move the camera given a certain scene was critical in producing a convincing dolly zoom. Overall I gained some insight into how open-ended research works. I saw how at the end of my projects, there were very clear directions to explore for iterative improvements.