Colorizing Glass Plate Images

Project Overview

Sergei Mikhailovich Prokudin-Gorskii was a Russian photographer who pioneered color photography in the early 20th century. He took thousands of color pictures, each a series of three exposures of a scene onto a glass plate using red, green, and blue filters.

Each of these plates have been digitized as a single image containing the different plates aligned vertically on each other, with the blue plate on top, the green plate in the middle, and the red plate on the bottom.

The goal of this project is to take the digitized versions of those glass plates and to employ image processing techniques to align the plates (via translations/displacement) to produce a full color photograph with as few artifacts as possible.

An image depicting a cathedral, with three plates stacked vertically. — Examples of glass plate digitizations.

An image depicting a monastery, with three plates stacked vertically. — Examples of glass plate digitizations.

Approach

The images are first divided into thirds to isolate the blue, green, and red plates. At this point, I can match images using a normalized cross-correlation index. This is equivalent to normalizing two plates, then computing their dot product. This gives a decent approximation of how close two images match each other.

To make the matching index more accurate, I take only the center 90% of the image (removing 5% of the image size from each border). This prevents the border present in the original images of the plates from causing any issues. Then, I test various displacements, up to +/-5% of the image size. For example, for a 350x400 image, I would test displacements from -17 to +16 (max is excluded) for the X dimension, and -20 to +19 for the Y dimension. Once I calculate the optimal displacement, I can align the three plates together to form a fully colored image by setting the red, green, and blue values based on the corresponding plate for that color. (In the later galleries, 'G: (X, Y)' represents the X, Y offset to align the green plate, and 'R: (X, Y)' for the red plate.)

The primary issue with this approach is that it becomes much slower with a larger image. To combat this, I employed an image pyramid method to drastically reduce the amount of displacements the algorithm needs to check at higher resolutions. Specifically, if the image is too large, each plate is shrunk to roughly 100 pixels along the shortest side (determined via log base 2 of the shortest side divided by 100). Then, I calculate the optimal displacement for the smallest image as before. I can then double the size of the image and the previously-calculated optimal displacement. At this point, I only need to look at the closest displacements for this larger resolution, for example +/-3 pixels in either the X or Y direction. This continues until the best displacement is calculated for the original full-size image.

For an estimation on how much this will speed up calculations, a 3200x4000 image would require checking 320 different X displacements ([-160, 159]) and 400 different Y displacements ([-200, 199]) for images of size 2880x3600. Instead, I can calculate the best displacement for a 100x120 image, with 120 different displacements, and then 9 different displacements for each larger image up to the original 3200x4000. This is several orders of magnitudes faster.

The only image that proved to be difficult to match was the image of the Emir of Bukhara, as shown below. This is most likely because the blue plate was used as the reference for matching, and in this case there was a very high amount of blue, but a very low amount of red (the plate that had the incorrect displacement). Other than this specific image, the other examples were aligned well.

An image depicting the Emir of Bukhara, with three glass plates. — **Left:** Emir of Bukhara glass plates. On top is the blue plate; the "bright" garment represents a vivid blue. On bottom is the red plate, with the "dark" garment (lack of red).
**Right:** Reconstruction of colored image, with misalignment of red plate.

The colored image of the Emir of Bukhara, recreated by aligning the glass plates. — **Left:** Emir of Bukhara glass plates. On top is the blue plate; the "bright" garment represents a vivid blue. On bottom is the red plate, with the "dark" garment (lack of red).
**Right:** Reconstruction of colored image, with misalignment of red plate.

Results

Low-Quality Images (jpg)

Cathedral — Reconstructed image of a cathedral.
Displacement: G (5, 2); R (12, 3). Runtime: <1 sec

Monastery — Reconstructed image of a monastery.
Displacement: G (-3, 2); R (3, 2). Runtime: <1 sec

Reconstructed image of Tobolsk.
Displacement: G (3, 3); R (6, 3). Runtime: <1 sec

High-Quality Images (tiff)

Church — Reconstructed image of a church.
Displacement: G (25,-4); R (58, -4). Runtime: 25.5 sec

Reconstructed image of the Emir.
Displacement: G (49, 24); R (305, -200). Runtime: 24.4 sec

Harvesters — Reconstructed image of harvesters.
Displacement: G (60, 17); R (124, 13). Runtime: 24.5 sec

Reconstructed image of an icon.
Displacement: G (41, 17); R (89, 23). Runtime: 25.1 sec

Lady — Reconstructed image of an unnamed lady.
Displacement: G (55, 8); R (117, 11). Runtime: 25.1 sec

Melons — Reconstructed image of a melon stand.
Displacement: G (82, 11); R (178, 13). Runtime: 26.2 sec

Onion Church — Reconstructed image of a church with an onion-shaped roof.
Displacement: G (51, 27); R (108, 36). Runtime: 26.2 sec

Sculpture — Reconstructed image of a sculpture.
Displacement: G (33, -11); R (140, -27). Runtime: 26.1 sec

Self-Portrait — Reconstructed image of Sergei's self-portrait.
Displacement: G (79, 29); R (176, 37). Runtime: 26.3 sec

Three Generations — Reconstructed image of a family with three generations.
Displacement: G (53, 14); R (112, 11). Runtime: 25.2 sec

Train — Reconstructed image of a train.
Displacement: G (43, 6); R (87, 32). Runtime: 25.9 sec

Self-Selections

Reconstructed image of the Suna River.
Displacement: G (2, 0); R (10, -1). Runtime: 1.2 sec

Sunset — Reconstructed image of a sunset.
Displacement: G (5, 1); R (0, 2). Runtime: 1.2 sec

Cliffside — Reconstructed image of a cliffside.
Displacement: G (4, 0); R (15, -1). Runtime: 1.1 sec

Bells & Whistles

SSIM Metric

In order to successfully align the Emir image without hardcoding in workarounds (such as using the green plate as a base for alignment), I switched the metric used for image matching to SSIM: the structural similarity index. The SSIM index combines three different comparison measurements between two images, including luminance, contrast, and structure. It has been shown to outperform simpler metrics like MSE (and assumedly NCC, which is what I used).

Using SSIM, the Emir image was successfully aligned (see below). The other images saw little to no improvement, with displacements unchanged for the smaller images, and off by +/-1 pixel for the larger images (which are over 3000x9000 pixels large). SSIM outperforms NCC for images where the brightness values are very different, however the runtime is much worse. The average runtime for the large images was 136.19 seconds, up from 25.45 seconds, whereas the average runtime for the small images was 5.43 seconds, up from 0.96 seconds.

Thus, overall the SSIM metric was better at matching and aligning the images, succeeding on all the images. However, the SSIM index was roughly 5.4 times slower than the NCC index.

The misaligned Emir image, using NCC as the metric. — The Emir image, aligned using the NCC metric.
Displacement: G (49, 24); R (305, -200). Runtime: 24.4 sec

The correctly aligned Emir image, using SSIM as the metric. — The Emir image, aligned using the SSIM metric.
Displacement: G (50, 23); R (105, 40). Runtime: 132.1 sec

Contrast Stretching and Equalization

Several of the pictures appeared washed out over time. In order to address this issue, I employed various types of contrast stretching that all worked on the value of each image (V in the HSV colorspace). First was simple min-max stretching. This involved simply scaling image values so that the minimum value is 0 and the maximum value is 1 (as opposed to 0.1 to 0.8, for example). This was done by subtracting the minimum value from all values in the image and dividing by the max value after subtracting. This makes it so that the value has a range of exactly 1.

I also used contrast stretching with linear scaling, which takes the values in a certain range, and stretches it to 0 to 1, making anything outside of the range exactly either 0 or 1. In this case, I stretched the 2nd percentile and 98th percentile values to 0 and 1, so the bottom 2 and top 2 percent of values in an image turned into values of 0 and 1 respectively.

I employed adaptive histogram equalization - specifically using the Contrast Limited Adaptive Histogram Equalization algorithm (CLAHE). This algorithm uses histograms computed over different sections of the image, so that details can be enhanced even in areas where where is a lot of very dark or very light spots. This can however cause the amplification of noise where there is a very consistent value/contrast (e.g., a white wall might have its contrast boosted at points where there is noise or imperfections). CLAHE is thus used to clip the histogram at a specified value in order to prevent amplifying that noise.

Ultimately, the min-max stretching made virtually no visible change to a majority of the images and is thus omitted. For contrast stretching and adaptive histogram equalization, a few select examples which show where each method fails and where some succeed.

Each method of contrast stretching/equalization worked well for different kinds of images, and could likely be finetuned to each individual image. However, applied generally they did not work very well except in specific cases.

Church In this case, the most obvious change is seen in the contrast stretched image, which is too dark. The adaptive histogram equalization applied to the image looks slightly lighter but arguably not much better.
Original church with histogram.	Contrast stretched church (using linear scaling) with histogram. Image looks darker, contrast is overdone. Can be seen in the very high frequency of values equal to 1.0.	Adaptive histogram equalization applied to church with histogram. Image looks slightly more faded, the bimodal distribution is less pronounced.
Onion Church The contrast stretched image looks very similar, being very slightly more vivid. The adaptive histogram equalization image has more defined edges and looks arguably better.
Original church with histogram.	Contrast stretched church with histogram. Image looks slightly different, but not noticeably better.	Adaptive histogram equalization applied to church with histogram. The edges on the church in the image look more defined and detailed. Some colors are a tiny bit faded, but it arguably looks a bit better.
Self-Portrait The contrast stretched image looks noticeably better, with more vivid greens. The river and shadows look a bit darker, perhaps too much so. Adaptive histogram equalization applied to the image has too much contrast; the rocks are jarring in the river.
Original self-portrait with histogram.	Contrast stretched self-portrait with histogram. Image looks more vibrant. However, certain parts are much darker, such as the shadows and the river. This can be seen in the large spike in the histogram.	Adaptive histogram equalization applied to self-portrait with histogram. Contrast is boosted too high, rocks in particular are very jarring - likely due to the "local" histograms being in use. It can also be seen in the low and high value sections of the histogram becoming boosted compared to the original.

Colorizing the Prokudin-Gorskii photo collection

Bryan Li