Part I: Vertigo Shot (aka Dolly Zoom)

An Overview

The vertigo shot, also known as the dolly zoom, is a camera technique where the camera is dollied forward or backward while the zoom on the lens is pulled in the opposite direction. A dolly zoom effect seems to undermine normal visual perception - if it's done correctly, the characters/subjects in the frame appear to remain the same size while the foreground or background become compressed or de-compressed. Here is a famous example from the diner scene with DeNiro and Liotta in Martin Scorsese's Goodfellas.

Replicating the Dolly Zoom

How do we achieve this effect at home (without fancy film equipment)? It's actually pretty simple to recreate. We first choose a subject, whose size will remain fixed by adjusting our field of view (FOV) as we change our distance from the camera. The effect is a cool perspective distortion similar to the one we've seen in the video above. To make sure the subject remains fixed, I referenced the grid on my camera's digital viewfinder to make sure they aligned with the same points on my subject for each photo at a different FOV.

Image 1
Image 2
Image 3
Image 4
Image 5

traffic.gif

Bells & Whistles: Assembled into a GIF

More Results

Here, I chose a larger object (a Volkswagen Beetle) to be my fixed subject. As you can see, the object itself, though its in the foreground, becomes warped as the FOV changes, creating a cool perspective distortion effect.

Image 1
Image 2
Image 3
Image 4
Image 5
Image 6

beetle.gif

Bells & Whistles: Assembled into a GIF

Failure Case: Human Subject

It was extremely hard to focus a human subject. Unsuprisingly, correspondences were difficult to pick and not to include my subject's fidgeting that led to the image looking off. Here is a case where my photos were not composed adequately - leading the distortion to look jagged, as the subject is not completely fixed in one place. Nonetheless, the effect can be seen and it still looks cool, albeit imperfect.

Image 1
Image 2
Image 3
Image 4

extra.gif

Bells & Whistles: Assembled into a GIF

Part II: Seam Carving

Overview

The implementation of this seam carving project is based on the SIGGRAPH paper Seam Carving for Content-Aware Image Resizing by Shai Avidan and Ariel Shamir. It is motivated by the increasing diversity and versatility of display devices and the consequent demands on digital media to conform to various layout constraints.

Because standard image scaling is oblivious to image content and must be applied uniformly, it is insufficient to fulfill these new demands. Hence, we must consider also the content when we resizing an image to ensure that the important parts of the image are preserved. Herein we present a solution: Seam Carving is an image operator that supports content-aware image resizing for both reduction and expansion - in my assignment, I will implement the reduction algorithm delineated in Python.

The main steps of our algorithm are:

    1. Energy Calcuation: Find cost/energy of each pixel in the image
    2. Seam Identification: Find the minimum-cost path/seam through the image
    3. Seam Removal: Delete the pixels in the seam
    4. Repeat: Reduce an image's width or height by repeatedley deleting seams from the image


Read on to see how I got from this to that with seam carving...

Before
After

Calculating Importance (Energy) of Each Pixel

In this step, we want to quantify the "importance" of each pixel. We define this importance as its energy - and I will use the basic e1 energy function defined in the paper. The formula is shown below:

To compute the derivative of the image, we use the Sobel filter to calculate the partial derivatives in the x and y direction. The result of this filtering on each pixel (for each channel in the RGB image) becomes our pixel energy value. We store the energy of each pixel in a map with equal dimensions of our image, which will use to identify optimal seams. Later, in Bells & Whistles section, I will explore alternative implementations of the energy calculation.

Here is the pixel energy map for the image shown above. This helps us visualize the edges and "high energy" parts of an image.

Identifying Seams: Finding Minimum Energy Path

To find the seam with the least energy, the optimal seam s* we want to carve from the image, we must find a path from the top to bottom (or left to right) of the image that has the least cumulative energy. We define a seam as an 8-connected path of pixels in the image because the next pixel in our seam must touch the last pixel via edge or corner.

To find the seam in get_vertical_seam(), we use a dynamic programming algorithm, traversing the image from the second row to last row and storing the minimum cumulative energy at each pixel in a nxm array M. M(i,j) stores the minimum energy value seen up to the (i,j)th pixel. After traversing the entire image, the optimal seam, which is the path with the lowest energy will be the pixel with the minimum value in the last row of M. The recurrence relation in the DP algorithm is defined below:

Each pixel in the seam must be either laterally or diagonally adjacent to the next pixel in the seam. So, for each potential seam pixel P at position (i,j), there are only 3 possible locations for the previous seam pixel, (i-1, j-1), (i-1, j), and (i-1, j+1). In a separate array called last_indices, I store the indices of the last pixel in the seam so I can easily backtrack from the last pixel in the seam to the first pixel later. Here's a visualization of the seam identification process.

Seam Removal

To carve out the seams, we delete the pixels from the optimal seam calculated in the last part. We do so by using our last_indices data structure storing the indices of the last pixel in the seam, backtracking from the last to first row. We then use a mask to reshape our image and carve out the pixels we don't want (those in the seam).

Repeating for every seam

Given a new desired width, let's call it n', of an image with width n, we want to repeat the last step of carving out the optimal seam n' - n = c times until we get our desired new width. Similarly, for height m and desired height m', we want to carve out the optimal horizontal seam m' - m = r times until we get the desired height.

Vertical Seam Carving Results

Here are some the results I achieved via seam carving to reduce the width of an image.

The Hobbit's Shire
Width Reduced to 60%

Original El Capitan Wallpaper
Width Reduced to 50%

Lake Original
Width Reduced to 50%

Horizontally Seam Carving Results

To carve an image horizontally, one needs only to rotate the image first before carving the minimum cost seam and then rotating it back afterwards. So, we do not have to implement a separate function for carving horizontal seams. Here are some results:

Hot Spring
Height Reduced to 70%

Waterfall in the Forest
Height Reduced to 50%

Utagawa Hiroshige
Height Reduced to 80%

Failures

Seam carving is far from perfect. As we push the limitations of the algorithm, artifacts begin to emerge. Lines, faces, among other highly structured shapes are not preserved in seam carving, as we merely choose the seam with minimum energy. Furthermore, seam carving does not do well with highly textured backgrounds, since it often does not discriminate the subject of the image as having higher energy. Highly textured backgrounds throw the seam carving algorithm off, as seams from the "important" subject are removed in favor of the "unimportant", yet high energy texture of the background. Here are some cases where seam carving does not work quite as expected.

Fail: Textured Backgrounds

All the foliage is making it hard for the seam carving algorithm because it appears high-energy even though it is the background. As you can see in the below result photos, the foliage in the background is preserved at the expense of the structures of the castle (subect). The same applies to the other two photos, where the subject becomes obscured and distorted in favor of preserving the "unimportant" texture of the background.

Hogwarts
Width Reduced to 70%

Forest Background and Train Subject
Where is the Train? Width Reduced to 50%

My Own Photo
Width Reduced to 50%

Fail: Highly Structured Objects

As you can see below, the linse of the building in the bottom left have become distorted after cutting half the seams in the image. The lines bordering the image are now jagged and unshapely. Clearly, seam carving does not quite work here. However, notice that the words and the figures in the painting are well preserved - showing that with some minor tweakings, this algorithm could work well for this input image.

Utagawa Hiroshige
Width Reduced to 50%

Straight Lines

Guernica
Width Reduced to 70%

Faces

Rihanna
Who is She??? Height Reduced to 70%

Bells & Whistles

1. Mitigating Failures: Different Energy Functions

I will focus on mitigating the failure caused by textured backgrounds, which was the most common failure case for my input images. I will do so by introducing different energy function. I hypothesize that the high energy of the foliage texture in the backgrounds of my failure images led to the poor seam carving. To mitigate this, I will first apply a gaussian filter on the image with sigma=1.5 and sigma=2.5. Then, I take the derivative of the gaussian filtered image as in my simple e1 energy measure.

I believe that it will lead to a better result because applying a Gaussian filter first will smooth texture of the background and decrease the energy of these areas in an image. Below are the comparisons between using the e1 energy function and gaussian derivative energy function .

Original Photo (shot by me)

e1 energy visualization
Width Reduced to 80%

As you can see, the subject is distorted, as well as the building on the right.

Gaussian derivative energy visualization (sigma=1.5)
Width Reduced to 80%

Gaussian derivative energy visualization (sigma=2.5)
Width Reduced to 80%

Even with a higher smoothing filter, the photo is not perfect - which suggests a lot more work to be done to mitigate the failures in this case. However, we can clearly see the result becomes a lot less distorted when we simply smooth the image first with a Gaussian filter. Hence, one heuristic for determining whether to use Gaussian derivative over the e1 energy function is the amount of texture in the background of an image. The greater the amount of texture in the background, the more you will need to smooth the photo beforehand to prevent noise from affecting the result.

Original Photo of Hogwarts

e1 energy visualization
Width Reduced to 70%

As you can see, the castle subject is distorted as described in the previous section.

Gaussian derivative energy visualization (sigma=1.5)
Width Reduced to 70%

Although there are still obvious artifacts in the photo, as can be seen by the distorted angles of the castle, the distortion is less significant.


2. My Own Composition: Seam Carved

Original Photo
Width Reduced to 80%