Colorizing the Prokudin-Gorskii Photo Collection

Jazz Singh, September 2020


Project Overview

The Prokudin-Gorskii Photo Collection consists of sets of color channels produced from glass plate imaging; unfortunately, reconstructing a color image from these raw channels leads to noisy visual artifacts, stemming from a combination of positional and lighting-related misalignment. The objective of this project is to align these channels and recover clear photographs.

For more long-form background on the project, as well as information on the many techniques that can be applied to reduce visual artifacts in these images, see https://www.loc.gov/exhibits/empire/making.html.


Approach

In the majority of cases, tackling the positional misalignment issue in these photographs is sufficient towards producing clear results. We can re-frame the problem as an optimization problem where we attempt to shift two out of the three color channels to align with the third color channel; in other words, we are attempting to find the displacement vector (x, y) that optimizes a scoring function f.

Below, I've described my approach in solving this problem for low-resolution and high-resolution images.

Low-resolution images

A naive way approach to solving the above optimization problem is to search over a (reasonable) window of possible displacement vectors, for the vector that leads to the best alignment score. Using the Euclidean distance between each channel and a reference channel as the scoring function, this method indeed yields good results for low-resolution images. Pictures in the results section demonstrate this.

Note that I also experimented with the normalized cross-correlation score, which ended up yielding near-identical results.

High-resolution images

Unfortunately, this naive method is far too slow for high-resolution images, as the window of possible displacement vectors is larger.

We can speed up our naive approach by using a bare-bones image pyramid: recursively find the best displacement vector for a smaller image scale; rescale this vector; and then refine the estimate to match the current resolution. This drastically decreases the time it takes to execute our algorithm on high-resolution images, from on the order of hours to ~10-15 seconds. Moreover, it ends up producing good results on all except one image. (This image contains lighting misalignment in addition to positional misalignment. See the Results section for more information.)

In more detail, the image pyramid approach to finding the best displacement vector consists of the following steps at each layer n.

  1. Recursively find the best displacement vector v for a rescaled version of the image (the rescale factor is some fraction s).
  2. Notice that our current image resolution is higher, but v was only found for a lower resolution image. We make the assumption that v was indeed the best possible vector for that smaller scale. So, we simply rescale v to find a vector that is close to the best possible vector at the current layer, i.e. multiply v by the inverse scale factor, 1/s. Call this vector v_hat.
  3. Because v was not at a scale fine-grained enough to allow for displacement vector predictions "between" the smaller scale's pixels, we must now refine v_hat at the current scale. So, we find the best displacement vector v' in a window around v_hat, defined by +/-1/s.
  4. Return v'.

I used the Euclidean distance as the scoring function for the image pyramid approach as well.


Initial Results

Below is a comparison of the original noisy and new aligned images side-by-side. I've included the displacement vector found by the above process for color channels R and G, where we let B be a reference channel. Notice, this approach fails on Emir; solving this issue is described in the Refinement section.

Note that the borders of the images were cropped out in pre-processing as well as post-processing stages. With respect to the former, this is because the large amount of noise at the borders of the input channels has an unwanted effect on optimization, so removing them leads to better results. With respect to the latter, aligning color channels via shifting results in slivers of visual noise at the borders.

The image names listed are taken from the original data collection, linked above.

Low-resolution images

bears bears bears bears bears bears
Cathedral Aligned, R: (12, 3), G: (5, 2) Monastery Aligned, R: (3, 2), G: (-3, 2) Tobolsk Aligned, R: (6, 3), G: (3, 2)

High-resolution images

bears bears bears bears bears bears
Castle Aligned, R: (97, 4), G: (35, 2) Harvesters Aligned, R: (123, 13), G: (59, 16) Lady Aligned, R: (113, 11), G: (49, 8)
bears bears bears bears bears bears
Onion Church Aligned, R: (108, 36), G: (51, 26) Three Generations Aligned, R: (110, 10), G: (52, 13) Workshop Aligned, R: (103, -12), G: (52, -1)
bears bears bears bears bears bears
Emir Aligned, R: (132, -282), G: (49, 24) Icon Aligned, R: (89, 22), G: (40, 17) Melons Aligned, R: (178, 12), G: (81, 9)
bears bears bears bears
Self Portrait Aligned, R: (174, 36), G: (77, 28) Train Aligned, R: (86, 31), G: (42, 5)

A few unseen images

bears bears bears bears bears bears
Bonus 1 Aligned, R: (62, 9), G: (26, 9) Bonus 2 Aligned, R: (63, 16), G: (25, 11) Bonus 3 Aligned, R: (35, 38), G: (9, 19)

Refinement

Canny edge detection

The above approach appears to produce very clear images with few if any visual artifacts, on all of the images except for emir. This is because there is brightness inconsistency in addition to positional inconsistency between channels in emir.

A better choice of image features resolves this issue. Switching from raw RGB pixels as our representation of input channels to edges (as found by the Canny edge detection algorithm) leads to this much better emir alignment:

bears bears
Initial alignment (see above) Refined with edge detection

Furthermore, this new feature representation yields comparable results on the rest of the images as well.

Automatic contrasting

To improve image quality even further, I implemented automatic contrasting. My approach converts the RGB image to the HSV color space, sets the darkest-valued pixel in the value channel to 0 and the lightest to 1, and reconstructs the RGB image using this new value channel. This process yields images that look visually "sharper".

Below is a comparison of refinement using only edge detection (see above) vs. refinement with both edge detection and automatic contrasting, on a few of the images. Automatic refinement leads to comparable or better quality images for all of the images.

bears bears bears bears bears bears
Refined with only edge detection Full refinement Refined with only edge detection Full refinement Refined with only edge detection Full refinement