Project 6: Seam Carving

In this project we will attempt to resize images while minimizing the changes to their overall content. To this end, we want to identify the parts that are important so that we can avoid changing them too drastically.

The unit of change for our resizing efforts will be a "seam". Essentially this is a thread of pixels that runs from one end of the image to the other, either horizontally or vertically. Each pixel in the seam can veer up to 1 pixel away from the previous one, which allows much more freedom than a straight column or row. By deleting or inserting seams, the size of the image can be modified incrementally in a similar fashion to using columns or rows but with more flexibility in avoiding important areas.

For the purposes of identifying what the important areas are, we will define an energy function for pixel in the image. A simple energy function that works well enough for our purposes is the L1 norm of the gradient of the image's intensity. $$E(x, y) = \left\lvert\frac{\partial I}{\partial x}\right\rvert + \left\lvert\frac{\partial I}{\partial y}\right\rvert$$ The intensity gradient can be approximated by convolving the image's intensity with finite difference operators for the x and y axes.

Generally, parts of the image where there is more change are more important since they are associated with things like edges which can help give form to the objects in the image. To identify seams that will likely avoid important parts, we will calculate the energy function for all pixels in the seam and sum the values. Seams with minimum total energy are the ones we will want to use first.

Finding the best seam can be done with a dynamic programming algorithm. Assuming vertical seams, the subproblem being solved for is the lowest cost path starting at the top and ending at a given pixel. The best seam overall is the lowest cost path amongst all best paths that end at the bottom pixels. Because of the seam constraint, each pixel at the bottom only needs to consider the best paths ending at its three adjacent pixels in the row above. So we can run the dynamic programming algorithm row-by-row to arrive at the answer.

For horizontal seams, it is sufficient to transpose the images and tranpose them back after seam carving.

Here is an example of finding the best seam (colored in red):

Results

We can now apply this by repeatedly identifying low energy seams and deleting them from an image. This should allow us to size down the image while preserving key features for the most part.

Let's apply it to more images:

Although seam carving can mess up highly structured images, it did a decent job in this case. A tighter window pattern was achieved, and the apparent distortions in windows close to the left might look as if they were windows installed at an angle.

Failures

Alas, not all images were meant for seam carving.

I'm guessing that the reflections of the branches had higher energy than the unfortunate neck and beak of the goose.

The trouble here is likely that the midsection of the fence post does not have high energy due to not having vertical edges. But the midsection is as important as the edges for the shape of the fence post. Otherwise it's just a twig.

Bells and Whistles

Instead of deleting seams, we can also attempt to insert them. As described in the paper "Seam Carving for Content-Aware Image Resizing" (Avidan et al.), inserting the optimal edge every time simply results in the same seam being inserted repeatedly. What we instead do is determine all the seams that would be removed by running seam carving, and then insert them in that order. Insertion is done by looking at the neighboring pixels on the left and right sides of the seam and then taking the average value of them.

Visualization of seams carved in an image:

The figure in the distance is a bit wider as a result of the seam insertions, though it doesn't look that out of place alongside the rest of the resized image.

And on the other hand, here is a failure case. The out of focus leg gets duplicated along with the rest of the background, resulting in a truly terrifying sight to behold.

Conclusion

The most important thing I learned from this project is that in some cases simple intuitive extensions of naive solutions can bring us to something that generally works well in a lot of situations (though definitely not all). For seam carving, this was the move from straight columns/rows to seams.

Project 6: Light Field Camera

The goal of this project is to use light field data in the form of multiple images taken in a grid on a plane orthogonal to a camera's optical axis in order to emulate refocusing the camera and adjusting its aperture.

Depth Refocusing

We can note that taking the average of all grid images results in far away objects being in focus and close objects being blurry. This is because far away objects do not move as much as nearby ones when the camera shifts along the grid. To shift the focus to objects that are closer, we will want to shift images on the grid to counteract the movement that closer objects undergo when as a result of camera movement.

To do this, we select an image to serve as an anchor and shift the other images according to their camera/uv coordinates relative to this image. When the camera moves right, objects move left, so an image to the right of the anchor in the grid should be shifted to the right. A similar rule applies to the other directions. We multiply the shifts by a constant, which controls the depth that we are refocusing to. If the constant is 1, the images are completely shifted to the anchor and only objects of distance 0 from the camera will be in focus. Smaller constants cause farther objects to be in focus, but also make close objects blurrier.

The center of the grid is a reasonable choice for the anchor, and so the shifts are calculated like so: $$u_{shift} = c(u - u_{center})$$ $$v_{shift} = c(v - v_{center})$$ Note that for the chess dataset, a more negative u corresponds to the camera shifting up instead of down, so the u shift had to be negated for it.

Here are the results of averaging using different c values:

A .gif showcasing the focus at different depths: d.gif

Aperture Adjustment

Averaging many images in a large grid emulates a camera with a larger aperture, as out of focus portions will have more contributions from unaligned images to their average, resulting in greater blur for them. This is similar to how a wider aperture allows out of focus objects to blur more. Conversely, reducing the amount of images used for the average will blur out of focus parts less, similar to making the aperture narrower.

To exclude grid images, we will go by their Chebyshev distance from the the center image's indices. We include images whose indices lie within a square centered at the central image's indices. The square has a "radius" (more accurately an apothem) of r.

Therefore, r=0 includes only the center image, and r=8 includes the entire 17x17 grid.

We fix the focus with c=0.2 and obtain these results for aperture adjustment:

Aperture adjustment as a .gif: a.gif

Conclusion

I learned that light fields can encode useful information for reconstructing an image as if it were taken with different physical camera parameters. This project also helped me to better understand the 4D parameterization of light fields where we have different slices of st coordinate grids indexed by uv coordinates, which represent views of image as captured by individual microlenses.