Images of the Russian Empire: Colorizing the Prokudin-Gorskii Photo Collection

Yi-Chen Chen

September 6th, 2021


Overview

This project aims to develop an algorithm to colorize images from the digitized Prokudin-Gorskii glass plate photo collection. Prokudin-Gorskii traveled across the Russian Empire to take color photographs: he recorded three exposures of every scene onto a glass plate, filtered by red, green, and blue, respectively. The goal of this project is to take these three exposures (also known as the R, G, and B channels), detect image similarity, and match and align the three channels to create a color image.

* Photography technique details can be found here.

* Project description details can be found here.


Alignment Method for Low-Resolution Images: Single-scale & Exhaustive Search

  1. Divide the original image's height by three and get three sub-images of the original image. From top to bottom, each represents the B channel, G channel, and R channel.

  2. Crop off a pre-specified amount of pixels from each side of the color channels so that the alignment only occurs on internal pixels of color channels and will not be affected by the dirty pixels on the borders. Also, doing so can reduce a significant amount of calculations when it comes to processing high-resolution images. I ended up cropping off 20% of pixels on each side.

  3. Align the G channel and the R channel to the B channel, respectively. The alignment is done by exhaustively search over a displacement window of [-15,15] pixels, along both the x-axis and y-axis. For every combination of (x, y) displacements, calculate the cost, i.e., the image similarity. I implemented a simple way to calculate the cost: Sum of Squared Differences (SSD), calculate the squared difference pixel-wise and get the sum.

  4. The combination of (x, y) displacement with the lowest SSD is the best displacement to align the two channels. Until now, you should have the best displacement of the G channel and the best displacement of the R channel.

  5. Stack the aligned color channels to produce a color image.

Original Image Unaligned Aligned Displacement
cathedral.jpg
g_displacement(x, y): (2, 5)
r_displacement(x, y): (3, 12)
monastery.jpg
g_displacement(x, y): (2, -3)
r_displacement(x, y): (2, 3)
tobolsk.jpg
g_displacement(x, y): (3, 3)
r_displacement(x, y): (3, 7)

Alignment Method for High-Resolution Images: Multi-scale (Image Pyramid)

For higher resolution images, the naive solution of exhaustive search becomes expensive to process a single image. So I implemented an image pyramid to improve the algorithm.

  1. Same as step 1 of low-res images.

  2. Same as step 2 of low-res images.

  3. Align the G channel and the R channel to the B channel, respectively. However, instead of doing the exhaustive search directly on the highest-resolution images, creating an image pyramid and scaling the images down multiple times is required. An image pyramid is a multi-level image stack, which contains the same image but with different resolutions. Starting from the coarsest scale (the low-res image) and going down the pyramid to the high-res image, update the estimation of the (x, y) displacement as you go by recursively calling the function you use for single-scale alignment. The recursive function stops when it hits the base case where the scaling factor is 1.

    * Thank Violet Yao for explaining this in detail: For example, let us say we do an exhaustive search for the top-level exploring a window of [-15, 15] and find that the optimal x offset is 4. Let us move down a level and say the scale factor is 10. So now the x offset is 4 * 10 = 40. We do another exhaustive search exploring [-15, 15], and the optimal x offset is 2. We update our x offset to be 42. 42/10 = 4.2 is not explored at the top level because we do not explore numbers between 4 and 5 at the top level. We repeat this process until we hit the bottom level.

    I constructed an image pyramid with 5 levels, starting with the full-resolution image and then down-scaling by factors of 2 at each level: 1, 1/2, 1/4, 1/8, and 1/16. For each level, I exhaustively searched an 8*8 area.

  4. For each recursive call, get the combination of (x, y) displacement with the lowest SSD as the best displacement and update the final displacement result until the recursive call ends. Until now, you should have the final-best displacement of the G channel and the final-best displacement of the R channel.

  5. Stack the aligned color channels to produce a color image.

Unaligned Aligned Displacement
church.tif
g_displacement(x, y): (4, 25)
r_displacement(x, y): (-4, 58)
Time Spent (sec): 48.11599898338318
emir.tif
g_displacement(x, y): (24, 48)
r_displacement(x, y): (-197, 235)
Time Spent (sec): 59.1813850402832

Here is a problem! Please see the next section for the solution.

harvesters.tif
g_displacement(x, y): (17, 59)
r_displacement(x, y): (15, 123)
Time Spent (sec): 35.33838629722595
icon.tif
g_displacement(x, y): (17, 41)
r_displacement(x, y): (23, 90)
Time Spent (sec): 35.365917682647705
lady.tif
g_displacement(x, y): (8, 51)
r_displacement(x, y): (11, 111)
Time Spent (sec): 40.0497772693634
melons.tif
g_displacement(x, y): (9, 81)
r_displacement(x, y): (13, 179)
Time Spent (sec): 39.810834646224976
onion_church.tif
g_displacement(x, y): (27, 50)
r_displacement(x, y): (37, 108)
Time Spent (sec): 40.9604709148407
self_portrait.tif
g_displacement(x, y): (29, 78)
r_displacement(x, y): (37, 175)
Time Spent (sec): 45.00173234939575
three_generations.tif
g_displacement(x, y): (15, 50)
r_displacement(x, y): (13, 110)
Time Spent (sec): 37.09463381767273
train.tif
g_displacement(x, y): (6, 41)
r_displacement(x, y): (33, 85)
Time Spent (sec): 36.39931297302246
workshop.tif
g_displacement(x, y): (-1, 53)
r_displacement(x, y): (-12, 105)
Time Spent (sec): 35.91730284690857

Edge Detection

In the Emir of Bukhara case, the three channels of images do not have the same brightness values, and thus causes errors in the image similarity calculation process. By doing edge detection before comparing image similarity, this problem can be solved. The detector I used is the Canny edge detector from scikit-image.

Original Image Edge Detection

Result of the Previous Method Improved Result Displacement
emir.tif
g_displacement(x, y): (23, 49)
r_displacement(x, y): (40, 107)
Time Spent (sec): 36.40872311592102

Images of My Own Choosing

Unaligned Aligned Displacement
in_little_russia.jpg
g_displacement(x, y): (0, 2)
r_displacement(x, y): (0, 12)
reserve_girders.jpg
g_displacement(x, y): (2, 1)
r_displacement(x, y): (3, 4)
wall_paintings.jpg
g_displacement(x, y): (1, -3)
r_displacement(x, y): (3, -1)
cloth.tif
g_displacement(x, y): (52, 57)
r_displacement(x, y): (81, 112)
Time Spent (sec): 35.477495431900024