Images of the Russian Empire:
Colorizing the Prokudin-Gorskii Photo Collection

CS 194-26 Project 1
Alice Tarng

Overview

From 1907 to 1915, a man named Sergei Mikhailovich Prokudin-Gorskii traveled around the Russian Empire, taking thousands of photographs of the scenes he saw. Though this was before the era of color photography, Prokudin-Gorskii believed strongly in its potential. He recorded 3 different exposures of every picture he took, using a blue, green, and red filter accordingly. Eventually, these negatives were purchased by the Library of Congress, and digitized. Now, we want to take these black and white negatives of the past, and turn them into color photos.

Approach

Because each image 3-set is already split into R, G, B channels, we can combine these 3 images into 1, with each pixel containing an array of its R, G, and B values. Modern code libraries can interpret such pixel matrices, and output the corresponding color image. Thus, the problem becomes: how do we align the R, G, B images, to determine each pixel's RGB value?

The metric used was Sum of Squared Differences (SSD), where the value (image1-image2).^2 is summed up over all the pixels. The blue image was chosen as the alignment standard, and the red and green images would be displaced by some (x, y) vector to align to the blue.

The naive method is to search over a window of possible displacements [a, b]; trying every displacement vector, and finally taking the [x, y] displacement that has the minimum SSD value. This is done for both the red and green images. Then the RG images are shifted by the computed displacement, and aligned with the B, to produce the colored image.

The naive method is fine for smaller images, but for the high-resolution scans, this exhaustive search becomes extremely slow and expensive. In those cases, we use an image pyramid. Here, the image is scaled down by some power of 2, and then searched over for the ideal displacement vector, as in the naive method. Then, we recursively scale back up a factor of 2, shift the image by 2 * previous_displacement_vector, and search again over some (possibly smaller) displacement window, adding it onto our current displacement. (The previous displacement vector is multiplied by 2 because the image is now 2x the size, so each old pixel essentially takes up 2 now.) Once we're back to the original size and have our final displacement vector, we once again shift R and G, and align with B.

One thing to pay attention to is how the displacement window should change from layer to layer in the image pyramid. The pyramid is meant to speed up computation for the larger images, and it does so by not searching a large displacement window for big images, since the shift is expensive. In the early versions of the code, the search window wasn't changing from layer to layer, so even at the bottom-most one, with the full-sized image, we were still searching over a decent-sized window. This meant we weren't getting any improvements on efficiency at all (in fact, we were doing extra work in comparison to the naive algorithm, with all the layers), so coloring a high-resolution scan was extremely slow. Once this problem was identified, the code was changed to have the window halved each time we dropped a layer, and the process sped up immensely.

Example Images

The colorized example images, along with their computed displacement vectors:


Naive Algorithm on Smaller .jpg Images

cathedral.jpg


R: [7 -1]
G: [1 -1]

monastery.jpg


R: [9 1]
G: [-6 0]

nativity.jpg


R: [7 1]
G: [3 1]

settlers.jpg


R: [14 -1]
G: [7 0]



Image Pyramid Algorithm on All Example Images

* Original high-resolution .tif scans have been output here as smaller, lower-resolution .jpg for efficiency's sake.

cathedral.jpg


R: [6 -1]
G: [0 -1]

emir.tif


R: [106 17]
G: [-4 7]

harvesters.tif


R: [118 6]
G: [114 -4]

icon.tif


R: [89 22]
G: [42 16]

lady.tif


R: [123 -17]
G: [56 -6]

monastery.jpg


R: [6 0]
G: [-6 0]

nativity.jpg


R: [6 1]
G: [3 0]

self_portrait.tif


R: [126 -5]
G: [50 -2]

settlers.jpg


R: [9 -1]
G: [6 0]

three_generations.tif


R: [108 7]
G: [52 4]

train.tif


R: [102 1]
G: [41 -2]

turkmen.tif


R: [78 0]
G: [56 4]

village.tif


R: [114 -15]
G: [142 -8]

Failed Images

There are a few images that did not align very well, and thus don't have a very sharp colored image: emir.tif, harvesters.tif, monastery.jpg, self_portrait.jpg, turkmen.tif, village.tif. The main reason these images did not align well is because of the differing intensity values between the B, G, R images. Because the code aligns using SSD, it looks for the shift where the pixels have (roughly) the most similar corresponding values. However, this is based on the assumption that the best alignment is where the pixels are the most similar. If the image naturally has different intensities between the color channels though, this may not be a good assumption. The best alignment might be, for example, where the corresponding values between the blue and red are very different, and the SSD isn't minimized. So this assumption might have led to some less-than-perfect alignments.

Selected Images

Some more colorized photos, using the image pyramid algorithm, from the Prokudin-Gorskii collection:
* Original black and white scans were all high-resolution .tif files, but results shown here are smaller, lower-resolution .jpg for efficiency.


canal.jpg


R: [82 -1]
G: [40 2]

church_utensils.jpg


R: [123 12]
G: [66 8]

cotton_gin.jpg


R: [130 31]
G: [62 22]

cross_icons.jpg


R: [102 27]
G: [48 22]

entrance.jpg


R: [130 30]
G: [54 17]

plaque.jpg


R: [32 2]
G: [-16 6]

production_shop.jpg


R: [106 30]
G: [48 17]

Tiflis.jpg


R: [90 42]
G: [40 24]