Herein lies both of my final projects for CS194! Final project 1 is on seam carving and final project 2 is on lightfield cameras. They are both located on this webpage.


Final Project 1: Seam Carving

Vincent Tantra

Seam carving is an algorithm that allows shrinking of an image along one dimension without the loss of too much information in the image. Let's try it!

The Algorithm

At a high level, seam-carving aims to find "seams" of low energy pixels in an image. A seam is a path of pixels where either a) no 2 pixels are on the same row (if it's a vertical seam) or b) no 2 pixels are on the same column (if it's a horizontal seam). Pixels also have to be adjacent to each other, leaving only 2 diagonals and the direct neighbor pixel as viable options for the next step in the seam. We then delete the seam's pixels in the image, leaving a result that is one small pixel towards the target dimension, one giant leap for pretty photos that are smaller without looking like they've been smooshed in a cartoon.

Energy of pixels was calculated using gradients in the x and y direction. I first convolved the image with either DX or DY, just like the convolutions done in project 2's edge detection. I then used the resulting array of gradients to find the energy of each pixel. A finer analysis of the algorithm will show that it requires a bit of dynamic programming. Namely, we can assume that the minimum total energy at any pixel will just be its gradient value plus the minimum of the total energies of the 3 pixels before it. Therefore, we can build up an "energy map" that is built dynamically and stores the values of the total energies at each pixel. To find the lowest energy seam, we only need to find the smallest total energy at the end of the map (the bottom or rightmost slice of the 2D array). We then repeat the process only for that specific seam, to find the pixels that belong in it. Then, we take them out. This entire process is repeated as many times as needed to achieve the target dimension.

The Images

But I bore you with the words; it's carving time. Below are my results on applying seam carving to several images, in an effort to discern how well it works in horizontal carving (and how important sky cowboys really are). These images are all originally 1000 pixels wide and reduced by 250 pixels. If you would like to educate yourself on the source material, please watch this.

As can mainly be seen in examples 3 and 5, prominent features in the landscape such as rocks and trees have big gradient energy, so they cause artifacts in their surroundings (such as in the faded, weak-sauce faded cowboy spirit) if the reduction is too large. Asides from that though, these images mainly show successful applications of the algorithm applied horizontally (landscape photos prove to be good for this).

With some fernangling with the matrices, the algorithm can also be applied vertically. This helps when you really need some help paying attention to only the most important parts of, say, a Tik Tok or something. These images start at 1000 pixels and are reduced by 300 pixels in the vertical dimension.

Note how the gradient background upon which the artist dances is mostly retained, a feat unachievable by mere peasant cropping.

Or course, the algorithm is not perfect. Images contain a finite amount of data, and any reduction that is too extreme (reducing an image by 50% of its size or more) when there is a lot of information to retain will invariably create some artifacts of the seam carving. If gradients are roughly similar throughout the entire image, this may result in carving that looks similar to just your old regular resizing done in bad PowerPoints. Alas, some failure cases are drastic, and make me sad (700 pixels reduced by 200 vertically, in the following pictures). (These don't exhibit many artifacts, but the large amount of detail in all parts of the image cause it to undergo that compression effect, makig things noticeablly a bit squashed).

Elmo is sad because he's gaining weight.
Elmo is sad because a capitalist economy forces him into a 9-to-5 where he feels no personal fulfillment, purpose, or self-growth.

The following images look poor because of a large amount being carved out of a detailed image. We are taking out 250 pixels from a 400-pixel dimension.

YOU'LL NEVER TAKE ME ALIVE
just kidding they took me alive and now I'm trapped inside

Bells and Whistles: Stretching Images using Seam Insertion

Stretching an image is also possible using this seam algorithm. Instead of deleting the seam, we add a new one where the lowest energy seam is located. This allows us to sneakily add data into the picture without much being changed visually. The added seam's colors are computed as an average of its surrounding seams.

A small problem exists with the algorithm in that energy is defined as the gradient change in the perpendicular direction. When we add more seams that are an average between two, the gradient at the seam naturally becomes smaller. This naturally leads the algorithm to find its next minimum energy seam to be the same seam as the previous iteration, resulting in just a long block of added color (very uncool). To work around this, we can do several things, such as change the metric used as the energy function, or manipulate the algorithm to choose other seams. I added the index of the previously added seam to a list of indices to ignore, allowing my algorithm to choose low energy seams that weren't necessarily close to each other. This results in spread out insertions that create a better image. Below are some examples, with the altered (this time stretched) images on the right. The images are stretched by 100, 100 and 200 pixels respectfully.

Of course, adding generated visual data don't necessarily look good when a lot is added. Low energy seams with high-energy segments suffer from stretching artifacts if pulled too drastically. Below is an example of one of the previous images now stretched by 220 pixels.

AAAAHHHHHHHHHHH

I had a lot of fun in this project! It was great applying DP principles learned in 61B and 70 to something with image processsing. I found the intuition to be easy-to-understand and actually very clever.


Final Project 2: Lightfield Camera: Depth Refocusing and Aperture Adjustment with Light Field Data

Vincent Tantra

What's better than 1 camera? A whole bunch of them! A lightfield camera is basically that: a grid of cameras that capture the same image but with slightly different data, in order to have a higher dimensionality to play with later on (allowing changing of aperture and focus post-capture-of-the-image). Here we try out to do that post processing to get artsy. Photos are taken from the The (New) Stanford Light Field Archive rectified set of chess photos. The grid consists of 289 cameras arranged in a 17x17 grid.

Depth Refocusing

The first thing we can play with is the depth of our focus. Across the many images taken by our grid of cameras, discrepancies will be more noticeable between objects closer to the camera than those further away. When we average all the color values together, it results in a very blurry foreground and a sharper background. In order to manipulate this, we shift the images by their offset from the center image (located in position 8,8, or index 144 in my list or images). To play around with this shift further (moving the focus forward and back), I multiply it by a constant f which controls how much the images are shifted. The results are as follows:

f=-0.7
f=-0.6
f=-0.5
f=-0.4
f=-0.3
f=-0.2
f=-0.1
f=0.0
f=0.1
f=0.2
f=0.3

Aperture Adjustment

We can also play with the "aperture" of the image, or at play with its effects. A smaller aperture results in a deeper depth of focus, and a larger aperture a shallower depth of focus. We can simulate this by choosing how many pictures to include in the final image, starting from the center image again. The more we include, the blurrier the parts not "in focus" become, thus simulating a photo with a larger aperture. Radii were calculated simply as the Euclidean distance between any image's indices and the middle image's indicies (resulting in a max radius of around 11.3ish). Below are the results as I increment the radius for included images by 1 each step:

r=0
r=1
r=2
r=3
r=4
r=5
r=6
r=7
r=8
r=9
r=10
r=11
r=12

Bells and Whistles: My Own Image

I was curious to make my own image using these techniques. But my hands are too shaky from the excess amount of caffeine that I drink every day. The solution? I tried doing it in a video game! I play a online RPG called Final Fantasy XIV that has a flight mechanic as well as a camera mode. All I had to do was shift my character a little bit either up or sideways and I could simulate a grid of cameras somewhat decently, using a sticky note I put on my computer screen as a center reference. And in this game of dragons and magic, there is nothing more interesting than this lamp!

...it's just a lamp...

I had to go with this boring lamp for a few reasons. Time and weather, and therefore lighting, is dynamic inn FFXIV. I would not have consistent lighting if the pictures were taken "outside". Characters also frequently roam around, and there's a lot of movement, so I restricted my field of view to a small, static object. At least it glows!

Using a game definitely minimized the amount of discrepancies in between pictures. The only problem is the blur itself is not that fine-grain, because of time constraints. I only had time to take 100 screenshots (a 10x10 grid of images), so the data is not nearly as complete as those from the Stanford dataset. With more granularity and more data, the results could actually be quite nice. It's cool seeing something with "real-life" photography, namely focus, be simulated in a virtual environment. 'f' values ranged from -2 to 3 for these photos.

Starting from -2...
Incrementing f by 1 each step...
IT'S IN FOCUS
WOAH THE BACK WALL IS KINDA IN FOCUS

I also try to do the aperture effect, with radius from 0 to 5. There is a barely noticeable drift towards the left as aperture increases; this is due to the fact that there are an even number of photos, and thus no true center (and thus the more images we add, the more the data shifts slightly in one direction). But I am too lazy to take another set of pictures (it took forever :( ).

r = 0
r = 1
r = 2
r = 3
r = 4
r = 5

I like seeing the difference from the single screenshot to the average! In a game, there's no need to blur the background, so everything is perfectly rendered and crisp all at once. To simulate the blur effect is a new perspective that's interesting to me.

This project was fun! I was interested with Professor Ng's lightfield camera work in the past, but never realized how nice the results could be considering how the process to get them is fairly clever and simple.