Images of the Russian Empire

Colorizing the Prokudin-Gorskii photo collection

CS 194-26 Project 1  ·  Madeline Wu

before: the 3 RGB channel photos | after: colorized photo

Overview

Sergei Mikhailovich Prokudin-Gorskii was a pioneer of color photography. He utilized the three-color principle, which is a method of color photography that creates a color image by aggregating the results of three black and white photos, filtered through red, green and blue filters. From as early as 1907, Prokudin-Gorskii was taking color photographs on glass plates to document the the culture and industrialization of the Russian Empire during the 20th century. You could say that Prokudin-Gorskii was a true ~hipster~, for using filters to create aesthetic photos, way before Instagram was a thing. What a guy.

Unfortunately, much of Prokudin-Gorskii's work was lost and he was never able to see the colorized results of his photographs. Luckily, some of the photographs were recovered and a collection is available online, thanks to the Library of Congress.

The goal of this project is to create colorized photographs from Prokudin-Gorskii's digitized glass plates. This involved combining and aligning the three RGB channels. The first step, combining, was rather trivial, since we are just stacking each of the channels on top of each other. The second step was alignment -- since each of the RGB photographs were taken separately, they were often shifted slightly and led to a poor combined result. Aligning the photographs was not only an essential component of this project, it was the most challenging.

a terrible version of the lady (combined channels, but no alignment)

Naive Approach: Exhaustive Search

I started off with a naive approach to align two color channels: exhaustive search. This method involved a few steps.

          Choose one channel to the be the displaced channel and the other to be the fixed channel.
          Pick a displacement value, , and shift the displaced channel over the  region. 
          Find the  and  displacement values that result in the best heuristic value.
      

That leads us to the next important step, defining a heuristic. To put it in general words, the heuristic tells us how "good" our alignment between two channels is or how "close" two photos are. The heuristic I chose to measure this value was the Sum of Squared Differences (SSD).

Let's go deeper into what this equation actually means in the context of a photo.

A photo can be represented as a matrix and each pixel is associated with a value, which gives us information about the color, brightness, and intensity. In this project, I loaded the images as matrices of doubles. When we compare two pixels, we are comparing their values; thus, the more similar two pixels are, the smaller the difference in their values. To account for directional discrepencies in the differences, we use the value of the difference squared. If we calculate that squared difference for every pixel in the photo and sum them all, we've calculated the difference for the whole photo.

I used this exhaustive search with SSD heuristic method to align the R and G channels with Blue channel. Then combined the displaced Red, displaced Green, and Blue channels together to create colorized photographs.

Image Pyramiding

The exhaustive search method worked great for smaller files (jpg files), but didn't perform so well on larger images (tif files). Since the exhaustive search itself is a brute force search over many displacement options, but the process of creating a displacement matrix involves shifting each of the elements in the matrix. This can get quite expensive as the matrix gets larger.

To make the alignment process more efficient, I used an image pyramid. The image pyramid allows us to represent the image at different sizes. It's hard to explain this in words, so I've attached a figure to visually present what's happening at each level of the image pyramid.

We start off at the base of the pyramid, with the largest image size. We recursively scale down the image size, by a factor of 2, until the image is less than 300px x 300px. From there, the best alignment is calculated using an exhaustive search on a displacement window. However, contrary to the naive algorithm, the best alignment value is simply used as a starting point for the lower level's alignment calculation. This reduces the amount of computation significantly because if wanted to achieve the same result, with a single level in the pyramid, the displacement window would have to be many times larger. To avoid shifting large matrices, the displacement window is inversely proportional to the recursion depth.

A Different Heuristic: Edge Detection

The combination of exhaustive search (SSD heuristic) with image pyramiding did well for all images, except for one: the portrait of Emir of Bukhara. This alignment issue stems from the fact that the images do not have the same brightness values. I utilized the Sobel operator to employ edge detection as a different heuristic.

harvesters: sobel edges
emir: sobel edges
train: sobel edges

I ran each of the channels through the Sobel edge detector and aligned them based on the edges, rather than on the raw pixel values.

before: misaligned emir without edge detection
after: a super rad photo of emir

As demonstrated above, the results using edge detection were much better. In fact, using the edge images actually sped up the computation process as well.

Results

Here are the colorized results of each photograph, with the R and G channel displacements (x, y). I used a combination of exhaustive search with the sobel edge and SSD heuristic, with image pyramiding to speed up the process.

cathedral
R: (3, -11), G: (3, -5)
monastery
R: (3, -3), G: (1, 3)
settlers
R: (-1, -15), G: (0, -7)
tree
R: (29, -107), G: (21, -47)
self portrait
R: (37, -179), G: (29, -79)
three generations
R: (9, -111), G: (13, -55)
train
R: (33, -89), G: (5, -43)
turkmen
R: (25, -115), G: (21, -57)
emir
R: (41, -67), G: (21, -51)
harvesters
R: (41, -67), G: (21, -51)
icon
R: (21, -91), G: (17, -43)
lady
R: (9, -115), G: (7, -57)
village
R: (21, -139), G: (13, -67)
yurezan
R: (-3, -183), G: (1, -87)
conservatory
R: (33, -127), G: (29, -59)
onion church
R: (17, -123), G: (13, -59)

Auto White Balancing

White balancing involved two steps: estimating the illuminant and scaling the other pixels based on that illuminant. I took the brightest pixel in the image and treating that as the "true" white value. The rest of the pixels are scaled accordingly. I applied this process to the three channels before stacking them to create the color photo. Here are some select photos before and after white balancing.
cathedral
white balanced cathedral
conservatory
white balanced conservatory
emir
white balanced emir
yurezan
white balanced yurezan