Project overview

This project was inspired by the paper Seam Carving for Content-Aware Image Resize by Shai Avidan and Ariel Shamir. We determine the "least important" seam (sort of like a continuous line across an image) in the image, and then we remove that seam to shrink the image. After a few iterations, we obtain a signficiantly resized image with almost no distortion to the key image objects.

The paper also describes ways to stretch images, diagonally resize an image, and remove objects. It is interesting stuff, but it is not implemented in this project.

Carving images

First, we have to define an energy function, which indicates how important a pixel is to the composition of the image. I used a simple gradient function as my energy function. Then, I implemented a dynamic programming algorithm which computed the lowest energy seam value and would be able to backtrack up the entire image to determine what path led to this seam. Finally, in a loop, I would remove the lowest energy seam at each iteration and shift pixels around to form a resized image.

I only implemented it for determining vertical seams (top to bottom), so for determining horizontal seams, I simply transposed the image before starting the seam carving loop, and transposed again after fully resized.

Here are some examples of images that worked out well (both horizontally and vertically carved)

Original
Carved

Bells and Whistles: Comparing different energy functions

I implemented both an L1 and L2 gradient function, and I expected that they would perform similarly. However, this was not the case, especially in scenes that involved clouds. L1 gradient function led to more artifacts. Even when I thought L1 performed well, the artifacts become more noticeable when compared to the L2 function (specifically in the bridge photo). L2 gradient also appears to make more of the sky disappear in the bridge image, which (to the human eye) appears to be the least important part of the image.

L1 gradient
L2 gradient

Failure cases

Some of these images didn't do so hot regardless of which energy function we used.

If you carve too much of the image, it will look bad even if the energy function is good. This is shown in the last row with Professor Efros's face, where too many seams removed also removes part of his face.

original
L1 gradient
L2 gradient
100 seams removed (looks pretty good)
400 seams removed (cropped, distorted)

Conclusions and Takeaways

I really did not expect that a simple change in calculating the energy function could produce such noticeable results, in both the success cases and failure cases. Even though the computer can run the algorithm quickly, the magic still mostly comes from the user designed energy functions. Different functions work differently on an image. A computer can only do so much, so there is definitely an art to computer vision, which I think is the most important thing I took away and appreciated from implementing seam carving.

Project overview

This project was inspired by the work done with lightfields by Ren Ng. The light field is a vector function that describes the amount of light flowing in every direction through every point in space. We essentially reduce down the 5D plenoptic function (the space of all possible light rays) to 4D. We use lightfield data from the Stanford Light Field Archive.

Depth Refocusing

All images in the dataset are offset from the center by some bit. In order to refocus the image during post processing, we want to line up every image such that a point in the image lines up, but all other points do not due to the parallax effect. Before starting the algorithm, I processed each image file and file name to determine each image's grid coordinate and true pixel coordinate.

First, I determined the true pixel center coordinate of the image (c0, c1) by averaging all true pixel coordinates. Then, for each image in the data set, I computed the offset im_offset from (c0, c1). When it came time to generate refocused images, I shifted each image by its im_offset multipled by a constant value c between -0.8 and 0.8. A negative constant value focuses towards the front, while a positive constant value focuses towards the back.

c = -0.4
c = 0
c = 0.4

Aperture Adjustment

With a small aperture size, we do not get the depth/blurring effect that we see with a large aperture size. In order to generate images of different aperture sizes, we average only a subset of the data for each aperture size.

I chose the point of focus to be the true center of coordinates, as computed in the Depth Refocusing part. A full aperture size averages all the data images, while the smallest aperture size would contain just 1 data image (the clearest "generated" image). To produce the effect of different aperture sizes, I found all data images that were within s pixels of the true center, and averaged those. Here are some results

s
generated image
s=1
s=11
s=21
s=31
s=41
s=51
s=61
s=71

Conclusions and Takeaways

I think I have a better intuitive understanding of how light fields work, especially when thinking about it in terms of basic functions like scaled shifting, and in terms of parallax. In some ways, since we are taking an average, are we getting something like low frequency image of anything that isn't lined up? It's actually really cool that simple functions can produce these results after the fact. I am still not totally comfortable with the idea of lightfields (how exactly the camera is able to capture the image), but it's nice to know that a lot of the parameters that you need to fix ahead of time (aperture, focus) on a traditional camera can be tuned after the fact to produce the exact effect you're looking for.