Assignment 1: Image alignment and Sergei Mikhailovich Prokudin-Gorskii

Harry Ho

A brief overview

This assignment is an approach to colorizing images back when the technology for rendering colored photographs was not present. Prokudin-Gorskii's approach, namely recording 3 exposures of the same photo using a RGB filter, would combine to form a color image that was most easily discernable with the human's set of cone cells. We can align these images to create a color image from three black and white images with some image libraries readily available to us today.


Approach

The naiive approach to my algorithm involved taking a range of possible offsets given a certain window, and picking the image with the smallest possible difference. I used the sum of squared differences as the heuristic, which calculated the absolute difference between pixel intensity at every (i, j) on the image. This approach was mildly successful, but there were complications because the edges of each color matrix was misaligned or missing. Calculating the differences for the center half of the image, where there was higher chance of the pixel values being retained, was better than using the entire image itself.

The next step was to speed up the process by using an image pyramid. We want to align larger images without losing the resolution of the picture, yet going through hundreds of offsets, and squaring that number for two dimensions would slow our calculations drastically. Therefore, we have a binary approach to shifting our translation vector. We continuously resize our image by half until we hit a manageable data size, and calculate the offset for that value. What used to be hundreds of pixels in a square area is now a single pixel, and traversing a single pixel further down in our image pyramid traverses multiple pixels in our original image. As we recurse upwards, we rescale our translation vector, and readjust for loss of generality. This is a logarithmic approach to finding our final shift. A couple of images are displayed below

Images

cathedral.jpg
G : [2, 5], R : [3, 12]
monastery.jpg
G : [2, -3], R : [2, 3]
settlers.jpg
G : [0, 7], R : [-1, 15]
monastery.jpg
G : [2, -3], R : [2, 3]
train.tif
G : [0, 48], R : [32, 96]
turkmen.tif
G : [16, 48], R : [32, 112]
icon.tif
G : [16, 48], R : [16, 96]
lady.tif
G : [16, 48], R : [16, 112]
village.tif
G : [16, 64], R : [16, 128]
three_generations.tif
G : [16, 48], R : [16, 112]

Future Considerations

A couple of images didn't pass the image pyramid and sum squared difference approach. One conjecture is that the lack of red-green shift may cause the variation in difference to not be varied enough. Further edge detection may be necessary. Below is the Emir image that I wasn't able to align properly.

emir.tif
G : [16, 48], R : [-240, 32]