Ajay Ramesh, September 3, 2018

Images of the Russian Empire: Colorizing the Prokudin-Gorskii photo collection

Overview

Sergei Mikhailovich Prokudin-Gorskii (1863-1944) captured three grayscale images, each behind a different color filter - red, green, and blue, of various scenes in the Russian Empire. The Library of Congress has digitally scanned and uploaded the negatives to their archive. Since Sergei was not perfectly still when taking these photos, naively laying the RGB color channels on top of each other doesn't create the desired effect of a full color image. This project aims to spatially align the three color channels in order to produce a high quality color image with limited artifacts.

Algorithm

The main idea is to circularly shift the red and green color channels in the x and y directions such that their normalized cross-correlation (NCC) score is maximized with respect to the blue color channel. The NCC is defined as dot(image1./||image1||, image2./||image2||) where image1 and image2 are the vectorized versions of the 2D grayscale images. This quantity is often called the "cosine similarity" between two vectors. Since the scanned images have colored borders which are not part of the image, only 70% of the horizontal and vertical field of view, from the center, is considered in the NCC score.

The algorithm should output the circular shift in both the x and y directions which maximizes this score for both the red and green channels. Doing this for large images is computationally expensive and the operation takes unreasonably long to complete. More specifically, if we are searching in a window of [-dx, dx] and [-dy, dy] shifts, the algorithm has quadratic time complexity. To solve this, I used an image pyramid to estimate the majority of the shift on a coarse (read: small, subsampled) image. Then, I estimated the shift on the next level of the pyramid with a larger image, shifted with a scale of the shift estimate of the previous level, using a smaller search window, and so on. The shift at each level is scaled by the inverse of the downscale applied to the image at a particular level, and all the scaled shifts are accumulated into one x and y shift that is applied to the original image. At a high level, as the algorithm travels from the top of the pyramid, to the bottom, the search window shrinks by the same factor the image is scaled. Although this doesn't reduce the algorithmic complexity, it greatly reduces the clock run time of the algorithm. The set of small images each take <1s to align while the set of large images each take <8s to align (including the edge detection - more on that below)

Results

You may view all of them here.

Note: The results shown are using the Canny edge detection features for computing the NCC, described below. All images except for emir.tif look fine without using edge features for scoring, but I omitted displaying those results since the difference in image quality is imperceptible.

Bells and Whistles

Using edge features for scoring

Since the red, green, and blue color channels physically let in different intensities of light when exposed for the same amount of time, some color channels appear darker or lighter than others. Depending on the lighting of the scene, this descrepancy can have adverse effects on the alignment. For example, the NCC score might be low for a a candidate shift because the grayscale intensities greaty differ, even though the shift has correctly aligned the images spatially. So, instead of using pixel intensities as the features used to compute the score, I used a Canny edge detection implementation to retrieve the edge data of an image, shown below.

Since edges are invariant to the absolute intensity of the pixels, the binary image produced above makes for a much more accurate scoring metric.

Automatic cropping

As you can see in the results, there is a discoloration in the image borders. I aimed to develop a naive automatic cropping algorithm that tries to get rid of these borders. The algorithm scans each row of the color image, and finds the index of the maximum error among the RGB planes, on the left and the right. It does the same thing for each column of the color image, to find the cropping boundaries for the top and bottom. Here's an animation of the algorithm working on finding the left and right crop boundaries for one of the images.

Of course, this algorithm makes the assumption that 1) the maximum error happens on the borders, and 2) the errors after the maximum error are negligible.

Good examples

Bad examples

Most of bad results crop the image too far in, so it's clear that I need a better metric for finding the borders. If I were to approach this problem in the future, I would try identifying the borders by searching for straight lines on the color image.