CS194-26: Image Manipulation and Computational Photography - Project 1

Omar Buenrostro

Intro

The goal of the assigment is to create a program that takes in Prokudin-Gorskii glass plate image negatives and allign them such that a color image with as few visual artifacts as possible is produced. The glass plate negatives are split into their respective Blue, Green, and Red (top to bottom) color channels as shown below:

Glass Plate Negatives of monastrey.jpg
However, the images cannot directly be stacked to re-create the re-original image. As seen from the image below, the naively stacked images turn out quite blurry:
monastrey.jpg with no alignment

The process

After loading each image, the first step I took was splitting each image into 3 equal parts to get the plate's respesctive BGR values. This process is simple and is done in the starter code: the image is simply divide into parts each having one third of it's original height.

The next step was to decide upon a similarity metric to measure the similarity between two images. To keep it simple, I considered only the two recommened metrics: the Sum of Squared Differences (SSD) and the normalized cross-correlation (NCC). In the end, I decided upon the NCC because it worked slightly better on the image, albeit abit slower. I also excluded 10% of the image (the edges) in my calculation of the NCC simply because the borders serve no purpose in computing a decent alignment of the image channels.

Exhaustive Alignment Search

Using the B part of the image as the base, I first exhaustively searched for the best displacement vectors for the G and R values for each of the JPG files. It turns out that searching in the (-15, 15) grid of x-y displacements was sufficient for these files. For each x-y pair, I used np.roll to displace the the G/R pixel matrices, calculated the NCC between these translated matrices and the B matrix, and returned the shifted matrix whose (x, y) displacement values produced the largest NCC.

Combining these new alligned G and R matrices with the original B matrix produced the following results:

aligned cathedral.jpg

Displacement vectors:
g - [5, 2]; r - [12, 3]

aligned monastery.jpg

Displacement vectors:
g - [-3, 2]; r - [3, 2]

aligned nativity.jpg

Displacement vectors:
g - [3, 1]; r - [7, 1]

aligned settlers.jpg

Displacement vectors:
g - [7, 0]; r - [14, -1]


Exhaustive search worked well for all the JPG images, however running this simple algorithm is much more compuationally expensive for the much larger TIF images. This is where the Image Pyramid Search algorithm comes in handy.

Image Pyramid Displacement Search

Instead of exhaustively searching through all possible displacements, we can first shrink a TIF matrix by a factor f (32 worked well for all of these images) to make it possible to exhaustively search a small window of displacements (2 times the current factor worked well) around the origin to find the best displacement vector (x1, y1) in this smaller space. After this the factor cut in half, the TIF image is shrunk by the new factor, f/2, and another exhaustive search in a window that is half as small around the values (2*x1, 2*y1). This process is recursively repeated until eventually we search a window of size 2 around the displacement values found by the previous recursive call of the algorithm for the original TIF matrix. This method provides a significant speed-up because we have reduced the number of calculations we do to the huge TIF color channels. Each displacement vector usually took under 30 seconds to calculate.

The results of the algorithm on the TIF files (including an additional 3 images) are shown below:

aligned emir.tif

Displacement vectors:
g - [49, 24]; r - [103, 43]

aligned harvesters.tif

Displacement vectors:
g - [60, 16]; r - [124, 14]

aligned icon.tif

Displacement vectors:
g - [40, 17]; r - [89, 23]

aligned lady.tif

Displacement vectors:
g - [55, 8]; r - [110, 12]

aligned self_portrait.tif

Displacement vectors:
g - [79, 29]; r - [175, 34]

aligned three_generations.tif

Displacement vectors:
g - [55, 13]; r - [112, 27]

aligned train.tif

Displacement vectors:
g - [42, 6]; r - [87, 32]

aligned turkmen.tif

Displacement vectors:
g - [57, 20]; r - [114, 27]

aligned village.tif

Displacement vectors:
g - [65, 12]; r - [138, 23]

aligned arc.tif

Displacement vectors:
g - [60, 3]; r - [137, 4]

aligned flower.tif

Displacement vectors:
g - [19, -2]; r - [116, -9]

aligned tower.tif

Displacement vectors:
g - [36, 10]; r - [83, 7]


Overall, the image pyramid algorithm did a decent job alligning all of the images.

Bells and Whistles - Autocropping

To deal with the black and white border of the images, I used applied the horizontal and vertical sobel filter to the image for edge detection. Below is result of applying the horizontal and vertical filters to the blue channel of the cathedral.jpg image:

Determing where the white border and black border of each image lies from the above is relatively straightforward. Considering only pixels in the outer 1/5 of the image, finding arg-min of pixel intensities for the left and top edges and the arg-max of pixel intensities for the right and bottom edges gives a generally good approximation. To keep things simple, I only looked at pixel values along the 4 quarters of the image for each position, and took the max for the left and top min for the right and bottom values, and choose them as subset values. Below are the before and after pictures of the auto-cropped in action:

uncropped emir

cropped emir

uncropped flower

cropped flower

uncropped settlers

cropped settlers

One of the main issues with my approach is that it worked really bad for cropping the bottom border. This is because I used the blue channel to find the cropped image, since I alligned the red and green channels to it anyways. The problem with using the blue part is that the bottom border in this part of the image was only black, and was really small. So I decided to not crop the bottom part of the image, which produced results that were sufficient. The result is that I couldn't do anything about the weird colored borders in the lower part of the image.