CS 194-26: Computational Photography, Fall 2021

Project 1: Image Alignment

Daniel Jae Im, CS194-26

Overview

In this project, we worked with the images from the Produkin-Gorskii collection, which are laid top to bottom in three small images, where the brightness of each image corresponds to the value of the R, G, or B's intensity at that pixel location. Unfortunately, these pictures are misaligned. In this project, we are tasked with writing an algorithm to properly align the images within each channel to ultimately layer them on each other to create a full color image.

Example of a Produkin-Gorskii photo, where the image is seperated into three channels (R, G, B) in order.

Part 1: Simple Alignment with Exhaustive Search

In order to align the images, we first began with setting one channel image (in our case the Green Channel) as an unmoving 'anchor' channel, and taking the other two channels (Red and Blue) and exhaustively searching through a range of [-15,15] x, y displacements to see which produce the best overlaying on our green channel. We did this first with the red channel over the green channel, and the blue channel over the green channel. At each of these displacements, we compared the two channels using a similarity metrics called SDD, or Sum of Squared Differences, which is the sum of the squared differences between the pixel values in both channels. For the smaller images, this was sufficient to align them completely, and did not take much time at all. (only a few seconds) Here are the results for the smaller images we were given. The values R(X, Y) and B(X, Y) correspond to the displacement needed in the X and Y axes to properly align our images to our anchor channel.

Under each image is a R(X, Y) and B(X, Y) value that was found using my alignment algorithm.
The R(X, Y) represents how many pixels we move our Red Channel Image in order to align with the Green Channel. The B(X, Y) represents how many pixels we move our Blue Channel Image in order to align with the Green Channel.

Cathedral.jpg

R(7, 0), B(-1, 1)

Monastery.jpg

R(6, 1), B(-3, -2)

Tobolsk.jpg

R(4, 1), B(-3, -2)

Part 2: Alignment with Multi-Scaling and Image Pyramids

In order to align the images, we were using exhaustive search to essentially convolute one channel's pixel values over anothers. Unfortunately this means we were searching about 15 * 15 SSD's, which already scales exponentially based on the size of the image passed in. While our exhaustive method works for small images (.jpg's), it fails to produce a viable solution for larger images stored in less compressive formats (.tif). Exhaustive search on our .tif's took about an hour to even a day. For larger images, we employed a means of gradually building our best displacement through smaller improvements and estimations. The structure we used was an image-pyramid, which is a set of an image at different sample-rates. The original image might contain all the pixel values, but a downsampled version of the image will only contain the pixels at every other position. This downsampled version contains much less information, but the more we downsample, the less the overall size of our original image becomes (by a factor of 4 per downsample.), which means we can efficiently exhaustively search these images, unlike the original image. What this allows us to do is estimate a close-to-good displacement value from a very downsampled version of the image, at least to start. Then, we iteratively traverse through the image-pyramid, decrease the size of our displacement window from 15 to 1 overtime, as our images grow more upsampled, and we can use our previous exhaustive displacement algorithm, but this time on a much smaller window (think 5, 5) instead of (15, 15). For each level of our image pyramid, as we get closer to our original image we add to to a global displacement value, scale it by 2 (because as we are growing from downsampled to upsampled images, we need to start displacing further and further to achieve the same affect as in smaller images), and when we get to our final image and have a proper displacement almost as good as exhaustive search, we can finally roll our image and do all this in under a minute. Here are the results for the larger images we were given (.tif). The values R(X, Y) and B(X, Y) correspond to the displacement needed in the X and Y axes to properly align our images to our anchor channel.

Church.tif

R(-8, 33) B(-4, -25)

Emir.tif

R(17, 57) B(-24, -49)

Harvesters.tif

R(-3, 65) B(-16, -59)

Icon.tif

R(5, 48) B(-17, -41)

Lady.tif

R(3, 61), B(-9, -51)

Melons.tif

R(3, 96), B(-10, -81)

Onion-Church.tif

R(10, 57), B(-26, -51)

Self-Portrait.tif

R(8, 98), B(-29, -78)

Three-Generations.tif

R(-3, 58), B(-14, -53)

Train.tif

R(27, 43) B(-5, -42)

Workshop.tif

R(-11, 52), B(0, -53)

Part 3: Bells and Whistles

I automatically cropped the images' borders by 20% prior to alignment. This was to avoid having the borders of the image from producing too much noise while calculating our SSD values. The borders being so 'black' against the 'white' background meant the values would highly skew the displacement if we were to keep them. We didn't completely eliminate the values, just did not consider them when calculating our SSD.