CS 194 Project 1: Image Manipulation and Computational Photography

Background:

Sergei Mikhailovich Prokudin-Gorskii (1863-1944) as early as 1907 was convinced that color photography was the wave of the future. Thus, he traveled across the vast Russian Empire taking color photographs of everything: people, buildings, landscapes, railroads, bridges and more, capturing thousands of color pictures! His idea was simple: record three exposures of every scene onto a glass plate using a red, a green, and a blue filter. He envisioned special projectors to be installed in "multimedia" classrooms all across Russia where the children would be able to learn about their vast country. Yet, his plans never materialized as he left Russia in 1918. Luckily, his RGB glass plate negatives, capturing the last years of the Russian Empire, survived and were purchased in 1948 by the Library of Congress. The LoC has recently digitized the negatives and made them available on-line.

Goals for the Project:

Take the provided and chosen Prokudin-Gorskii's glass plate images as an input and produce a single color image as output. The program will divide the image into three equal parts and align the second and the third parts to the first. For each image, we will also print the (x,y) displacement vector that was used to align the parts.
Align the parts by exhaustively searching over a window of possible displacements, and score each one using some image matching metric, and take the displacement with the best score.
This exhaustive search works well for smaller images, however, will start to become quite expensive for pixels with larger displacements. Thus, I will implement a faster search procedure with an image pyramid. An image pyramid represents the image at multiple scales and the processing is done sequentially starting from the coarsest scale and going down the pyramid, updating your estimate as you go.

Examples of Uncolored Images:

Approach for Single Scale: Exhaustive Search with L2 Norm (SSD)

As mentioned above, I began with reading in the images and dividing them into 3 separate RGB channels.(Important to note, that the filter order went BGR and not RGB and is why we aligned G to B and R to B before aligning all three images together.)
Once separated, I iteratively searched over a displacement window of [-15,15]pixels and attempted to find the best alignment and its associated displacement x and y. I did this by using L2 Norm as a metric to find the best alignment parameters, which was done by choosing the displacement that returns the lowest L2Norm (which is just taking sum of the squared differences).
This worked well for the smaller images, such as the cathedral, monastery, and tobolsk as show below. Additionally, through a bit of pre-cropping and using sobel's edge detection I was further able to fine tune the alignment to get a much better single colored image.

Cathedral

G:[5,12] R:[12,3]

Monastery

G:[-3,1] R:[3,2]

Tobolsk

G:[3,2] R:[6,3]

Approach for Multi-Scale: Image Pyramid Search

As I tried to run my exhaustive search on large images (such as .tif files) my algorithm was very slow as I had expected and explained why before. Thus, we needed to enhance and make our algorithm much more efficient.
In order to do so, I began with creating a multi_scale_alignment function, which essentially began with scaling the image to 1/8. From there I found the best x and y displacements for 1/8 and repeatedly scaled up by 2 until the scale was 1 and at the end returning the best displacement x and y.
Once I had obtained the best displacements, just like before I was able to use a bit of pre-cropping and edge detection in order to better align these larger images and generate a single colored image from the three separate channels.
One thing to note is when I was working with emir.tif using blue as the base caused it to be quite blurry compared to many other images. So, I tried alternating between red and green as the base, and found green to work much better and even for some other images such as harvesters. Thus, for these larger images I ended up using green as the base while merging blue and red before aligning. The offset will be for blue and red, unlike when we used blue as the base and had green and red for our offsets above.