Colorizing the Prokudin-Gorskii photo collection

CS194-26 Fall 2018
Alan Nguyen, cs194-26-ags

Background

Sergei Mikhailovich Prokudin-Gorskii (1863-1944) [Сергей Михайлович Прокудин-Горский, to his Russian friends] was a man well ahead of his time. Convinced, as early as 1907, that color photography was the wave of the future, he won Tzar's special permission to travel across the vast Russian Empire and take color photographs of everything he saw including the only color portrait of Leo Tolstoy. And he really photographed everything: people, buildings, landscapes, railroads, bridges... thousands of color pictures! His idea was simple: record three exposures of every scene onto a glass plate using a red, a green, and a blue filter. Never mind that there was no way to print color photographs until much later -- he envisioned special projectors to be installed in "multimedia" classrooms all across Russia where the children would be able to learn about their vast country. Alas, his plans never materialized: he left Russia in 1918, right after the revolution, never to return again. Luckily, his RGB glass plate negatives, capturing the last years of the Russian Empire, survived and were purchased in 1948 by the Library of Congress. The LoC has recently digitized the negatives and made them available on-line.

Approach

Before developing the project, I noticed that all of the images contained borders that would potentially induce wrong offsets. Thus, after extracting the RGB channels, I cropped a fixed 5% of the images. Furthermore, I realized that I can implement an edge detection algorithm called Sobel through the skimage library in order to emphasize the edges in an image.

In short, after extracting the RGB channels, I preprocessed the images by cropping them and then constructing a set of images through the Sobel operator.

Naive Approach

I first constructed the naive search algorithm using a nested for loop that exhausted the [-15, 15] window. This meant that I created a translation (x, y), such that (x, y) were all combinations of integers in the search window [-15, 15] and translated a channel--specifically R and G.

Then, I computed the SSD comparing the shifted and reference channels. Eventually, after the end of the for loop, I would have my optimal translation (x, y) for the sobel channels. I shifted the original channels (no preprocessing) based on this offset and stacked them onto B.

Upon running the naive algorithm, all of the .jpg files (having a resolution less than 400x400) turned out to be perfectly aligned.

Results

Cathedral
Red: (3, 12)
Green: (2, 5)

Monastery
Red: (2, 3)
Green: (2, -3)

Nativity
Red: (1, 7)
Green: (1, 3)

Settlers
Red: (-1, 14)
Green: (0, 7)

Image Pyramid Approach

Though the naive algorithm was successful for all of the .jpg files, performing the algorithm on .tif files would have been deemed computationally expensive (since the files were of higher resolution). Fortunately, the image pyramid algorithm is essentially a divide-and-conquer algorithm such that there exists a pyramid of images with different scales.

Essentially, my image pyramid algorithm recursively rescales an image by 0.5x, making it smaller until it reaches a resolution under 400x400. Then, the naive algorithm would be called on the thumbnail and an offset would be returned.

When going back up the recursive stack (i.e. traversing an image 0.5x larger than the previous), the offset would be multiplied by 2 in order to account for scale. The bigger image would initially be shifted by this scaled offset, and the naive algorithm would be ran with the search window being [-1, 1]. Eventually, the algorithm would reach the original image (no scaling) with an optimal offset that was carried from the thumbnail.

Issues

Almost all of the .tif files turned out to be perfectly aligned--possibly due to edge detection playing a key factor. However, self_portrait.tif failed to align perfectly. This may be due to it being heavily green-dominant (since it is in nature). I suspect that a possible solution for this would be to have the green channel be a reference point, as opposed to blue.