Images of the Russian Empire

Steven Cao / cs194-26-adx / Project 1
Cathedral in color Cathedral RGB Images

From 1909 to 1915, Sergei Mikhailovich Prokudin-Gorskii created some of the earliest color photographs by recording each scene with a red, green, and blue filter. While he had no way of combining them into a color image, in 1948, the Library of Congress purchased his photographs and digitized them. The goal in this project was to automatically combine these RGB images into a single color image using alignment algorithms.

Image Alignment

Approach

Because the red, green, and blue images were recorded separately, they are not perfectly aligned. Therefore, to stack them into a color image, we must first align them. The alignment algorithm involved trying all horizontal and vertical shifts within some maximum displacement and choosing the shift that maximized the similarity between the two images.

The success of this algorithm depends greatly on the choice of similarity function. After trying multiple options, I settled on mutual information as the similarity function because it was able to align every image almost perfectly even when computed directly on the RGB values. For two discrete random variables, the mutual information is given by $$ I(X;Y) = \sum_{x \in \mathcal{X}} \sum_{y \in \mathcal{Y}} p(x,y) \log \frac{p(x,y)}{p(x)p(y)}.$$ We can think of a \(256 \times 256 \times 2\) image (with two channels \(X\) and \(Y\) being two color channels) as containing \(256^2\) draws of the random variable \((X, Y)\), where each pixel is a single draw, and both random variables are real-valued. Then, given two \(m \times n\) images \(\text{src}\) and \(\text{trg}\) containing \(mn\) joint draws of \(X,Y\), we can calculate \(I(X;Y)\) from the probability distribution \(p(x,y)\) defined as the histogram $$ p(x,y) = \frac{1}{mn} \sum_{i = 0}^m \sum_{j = 0}^n \mathbb{I}\{ \text{src}[i,j] = x\ \text{and}\ \text{trg}[i,j] = y\}, $$ except because the pixel values \(x\) and \(y\) are real-valued, we first divide them into discrete bins. Mutual information measures how much information \(X\) reveals about \(Y\). Then, in our case, it measures how much information the green or red channel reveals about the blue channel, which intuitively should be higher when the images are better aligned. For those interested in learning more, this blog post provides nice analysis of mutual information and its use in image alignment.

Implementation Details

First, each image contained black and white borders (as shown on the left), which should not be included in the similarity calculation. Therefore, I only calculated similarity for the middle 60% for each image.

In addition, many of the images were very large (roughly \(3200 \times 3700\)), with pixel shifts around \(100\) in each direction, making it slow to try every shift at full resolution. Therefore, to speed up alignment, I used a pyramid approach where we first aligned the image at \(1/8\) the resolution (such that the image was below \(512 \times 512\), and then repeatedly doubled the resolution and computed small shifts to refine the initial alignment. This approach increased speed without reduction in accuracy, allowing us to align \(3200 \times 3700\) images in roughly 45 seconds on a laptop.


Bells and Whistles

Image Recoloring

While the resulting colored images are quite high quality, many of them look unnatural if we naively combine the RGB channels by stacking them. In particular, many of them look cold, with a blue tint. Therefore, I implemented an automatic color balancing algorithm. Below is an example of the image before (left) and after (right) the recoloring.

Three generations recolored

To perform color balancing, I used the following approach, based on the Von Kries method:

  1. Convert the color space from RGB to LMS (long, medium, and short). This conversion can be implemented by first converting from RGB to XYZ, then multiplying by the Von Kries matrix.
  2. Subtract the minimum value from each channel such that the darkest pixel in the image is \((0,0,0)\).
  3. Estimate the luminant by finding the 99th percentile brightest pixel, where brightness is measured by the pixel value summed over the three channels.
  4. Multiply each channel by the ratio \(1/v\), where \(v\) is the pixel value for that channel of the 99th percentile pixel.
  5. Convert back to RGB and save the image.

Color balancing in the LMS space worked much better than doing so in the RGB space. The LMS space is derived from the three cone types in the human eye, making it better suited for color balancing.

Also, while estimating the luminant is typically very difficult, in these images it sufficed to simply use the 99th percentile brightest pixel. Using the 99th percentile instead of the max produced slightly better recolorings.

As in alignment, I also ignored borders by restricting to the middle 60% of the image when looking for the luminant.


Results

The following are all of the resulting images, where left is before recoloring and right is after. The offsets of the red and green images with respect to blue are also listed.

Peasant Woman: (138,18) (67,16)

Peasant Woman recolored

Natural Spring: (189,-10) (76,13)

Tobolsk recolored

Tobolsk: (7,3) (3,3)

Tobolsk recolored

Monastery: (3,2) (-3,2)

Monastery recolored

Cathedral: (12,3) (5,2)

Cathedral recolored

Melons: (177,14) (81,10)

Melons recolored

Onion Church: (108,37) (51,27)

Onion Church recolored

Workshop: (107,-11) (54,0)

Workshop recolored

Emir (106,40) (49,23)

Emir recolored

Icon: (89,23) (40,17)

Icon recolored

Lady: (115,13) (54,8)

Lady recolored

Three Generations: (111,12) (52,15)

Three generations recolored

Self Portrait: (175,37) (78,29)

Self Portrait recolored

Train: (86,32) (42,7)

Train recolored

Harvesters: (123,14) (59,18)

Harvesters recolored

Village: (138,22) (65,12)

Village recolored