Colorizing the Prokudin-Gorskii Photo Collection

By Jay Shenoy

Overview

The purpose of this project was to take several monochromatic stills of the same scene (photographed with three different types of filters) and align them to produce a colorized image. These stills were taken by Russian photographer Sergei Prokudin-Gorskii, who had the foresight to predict the inevitability of color photography and captured many photos that bring early 1900s Russia to life.

Approach

The way that alignment is done is by registering the red and green channels against the blue channel, which entails finding the correct offsets for these two channels. For low-resolution images, this is achieved via an exhaustive search over a 30 pixel by 30 pixel set of possible offsets. Essentially, we take a central patch in the red channel (middle square when overlaying a 3x3 grid over the image) and run normalized cross-correlation (NCC) against a window in the blue channel (the same middle square plus 15 pixels in either direction in both the x and y dimensions).

NCC takes two patches of the same size and first normalizes them, which means subtracting each pixel by the mean pixel intensity in each patch and then dividing by the standard deviation of pixel intensities. Next, we compute the dot product of the two patches, treating each one as a long vector. Patches that match more closely and have similar pixel content have higher dot products, whereas lower dot products indicate poorer matches. After correlating the selected red channel patch against every possible offset in the blue channel window, we choose the offset with the highest correlation value as the offset for the entire red channel. The same is done for the green channel.

For high-resolution images, the exhaustive search described above becomes prohibitively time-consuming because using a 30px by 30px offset window is not large enough - we must search through a much larger window of offsets. To remedy this, I implemented a multiscale pyramid alignment algorithm that aligns the channels iteratively at different resolutions. First, the algorithm rescales the channels so that they are less than 400px by 400px in size, and then aligns the channels using NCC with the same 30px by 30px offset window as before. Next, the algorithm upscales the channels by doubling them in each dimension, and uses the offsets output by the previous (coarser) resolution as a starting point for aligning the channels at the new (finer) resolution over a 30px by 30px window. We keep upscaling and iterating in this manner until we arrive at the original resolution. As a result, we can search over a much larger window of offsets and produce great results even for high-resolution images.

Results

Offsets [x, y] for the red and green channels are provided below each image.

Provided High-Resolution Images

Offsets - R: [5, 98], G: [2, 34]
Offsets - R: [33, 86], G: [7, 43]
Offsets - R: [29, -287], G: [24, 49]
Offsets - R: [14, 123], G: [18, 60]
Offsets - R: [23, 90], G: [18, 41]
Offsets - R: [14, 114], G: [8, 53]
Offsets - R: [13, 178], G: [10, 82]
Offsets - R: [-12, 105], G: [-1, 53]
Offsets - R: [37, 108], G: [27, 51]
Offsets - R: [37, 175], G: [29, 78]
Offsets - R: [10, 110], G: [13, 52]

Provided Low-Resolution Images

Offsets - R: [3, 12], G: [2, 5]
Offsets - R: [2, 3], G: [2, -3]
Offsets - R: [3, 7], G: [3, 3]

Extra Images

Offsets - R: [-5, 105], G: [4, 51]
Offsets - R: [-7, 151], G: [-1, 39]
Offsets - R: [12, -34], G: [10, -23]
Offsets - R: [36, 76], G: [22, 38]

Evaluation

The multiscale pyramid alignment algorithm performs remarkably well and produces color photos that could rival those taken with modern cameras. More importantly, the algorithm runs reasonably fast, aligning each image in under a minute.

One notable failure case is that of emir.tif (second row, first column above). Here, the channels are grossly misaligned because the original glass plate images have widely varying brightness values even within the same region of each channel. Since normalized cross-correlation operates on raw pixel intensity values, it fails when these values are drastically different for the "correct" offset. This can be fixed by aligning the channels using a smarter metric such as gradient or edge similarity.