Images of the Russian Empire

Overview

Sergei Mikhailovich Prokudin-Gorskii (1863-1944) [Сергей Михайлович Прокудин-Горский, to his Russian friends] was a man well ahead of his time. Convinced, as early as 1907, that color photography was the wave of the future, he won Tzar's special permission to travel across the vast Russian Empire and take color photographs of everything he saw including the only color portrait of Leo Tolstoy. His idea was simple: record three exposures of every scene onto a glass plate using a red, a green, and a blue filter.

The goal of this assignment is to take the digitized Prokudin-Gorskii glass plate images and, using image processing techniques, automatically produce a color image with as few visual artifacts as possible. In particular, I'll be splitting the raw images into three color channels, aligning the images and stacking them on top of each other to get an RGB image.

Naive Approach: Exhaustive Search Alignment

To create the RGB image, I first split the image into 3 color channels by splitting the overall image into thirds (the original image has all three channels next to each other, vertically).

 Blue channel, Cathedral Green channel, Cathedral Red channel, Cathedral

Naive Alignment

From there, we want to align these images then stack them atop each other. The first naive alignment approach I implemented exhaustively searches over a window of possible displacements (I used [-20,20] pixels), scores each one using the Sum of Squared Differences (SSD) distance which is simply `sum(sum((image1-image2).^2))`, and chooses the displacement (x,y) that results in the minimum SSD score. Another option is to align using the max Normalized Cross Correlation (NCC) score, but through initial tests there wasn't a major difference in performance. I applied this onto the red and green channels, aligning them with the blue channel, then combining the three channels; resulting in an image like this:
 Cathedral Red shift: (-1, 7) Green shift: (-1, 1)

Cropping Borders

The naive alignment worked relatively quickly for the `cathedral.jpg` image, but I noticed there was still some noise in the image where the channels weren't aligned correctly.

Thus, I temporarily cropped each image, so that only the middle 1/2 of the image would be used in the alignment calculations. This helped in removing the excess noise around the borders (e.g. frayed edges, damage to the film itself, etc.) from the calculations, and greatly improved the alignment quality. (I also cropped the resulting images by 10% on each edge to remove the dirty borders)

 Cathedral Red shift: (3, 12) Green shift: (2, 5) Monastery Red shift: (2, 3) Green shift: (2, -3)

However, this algorithm took a very long time to run on larger `.tif` files, and would not be able to shift images beyond the set displacement search range [-15,15] (which may be necessary to properly align larger images).

Examples where the naive alignment doesn't work on larger `.tif` files that need to shift more than 15 pixels:
 Harvesters Red shift: (-2, 15) Green shift: (-3, 15) Lady Red shift: (-15, 15) Green shift: (-8, 15)

We could just expand the search range, but that will become prohibitively expensive as we increase the range.

Thus, we need to implement a more optimized search procedure – an image pyramid.

Optimized Approach: Image Pyramid

For a more optimized approach to image alignment, I implemented the image pyramid, which first scales down an image by a scale of 2 until reaching a base case of an image that is less than 100x100 pixels. Upon reaching this base case, the algorithm performs an exhaustive search over [-15, 15] pixels on this super downsized image to find the optimal alignment. Then as we recurse back up the stack, I apply the returned displacement from the lower downsized level, multiply it by 2 (scaling factor) and shift the current level's image by that displacement amount. With this preliminary alignment, I then do another exhaustive search on these images over [-1, 1] pixels to find the current displacement. I then return the total amount of displacement from this proces. Overall, this algorithm essentially "wiggles" the image closer towards proper alignment at each recursive step, using information from lower-quality versions of the image.

Challenges/Problems

One issue I ran into was the emir image wasn't properly aligning to the blue channel. When I applied the same algorithm as the other images on the emir, I would get a skewed image as such:

 Emir Red shift: (-240, 3) Green shift: (47, 63)

Through trial and error, I realized that due to his coat being mostly blue, the blue channel wasn't the best to use for alignment, and using the green channel as the reference performed much better.

Final Results

 Castle Red shift: (4, 98) Green shift: (2, 34) Cathedral Red shift: (3, 12) Green shift: (2, 5) Emir Red shift: (17, 57) Blue shift: (-24, -48) Harvesters Red shift: (14, 123) Green shift: (17, 59) Icon Red shift: (23, 90) Green shift: (17, 41) Lady Red shift: (12, 111) Blue shift: (8, 52) Melons Red shift: (13, 178) Green shift: (10, 82) Monastery Red shift: (2, 3) Green shift: (2, -3) Onion Church Red shift: (37, 108) Blue shift: (27, 51) Self Portrait Red shift: (37, 175) Green shift: (29, 78) Three Generations Red shift: (12, 110) Green shift: (14, 51) Tobolsk Red shift: (3, 7) Blue shift: (3, 3) Train Red shift: (32, 86) Green shift: (6, 42) Workshop Red shift: (-12, 105) Green shift: (-1, 53)