Sergei Mikhailovich Prokudin-Gorskii was a Russian photographer, who undertook a photographic survey of the 20th-century Russian Empire for Tsar Nicholas II between 1905 and 1915. He is also known for his pioneering work in color photography. He was able to capture color by taking three pictures of each scene, each with a different red, green or blue color filter. His unique images of Russia on the eve of revolution were purchased by the Library of Congress in 1948 from his heirs. Walter Frankhauser, a photographer contracted by the Library of Congress, manually registered and cleaned up about 120 of the original high-resolution scans, with breathtakingly beautiful results. The results of his effort can be seen at the online-exhibit The Empire That Was Russia.

We can observe the three different lenses and filters for each RGB color.

Prokudin-Gorsky's camera (left) and projector (right)

The channels are arranged as BGR

Glass plate photograph
Problem Description & Goals

In this project, we are given a small set of low- and high-resolution images from the Prokudin-Gorskii collection. Each image contains three exposures arranged in a vertical stack. As shown in the picture obove, the top image represents the Blue channel, the middle image represents the Green channel, and the bottom image represents the Red channel. Our goal is to take each of these glass plate images as input and produce a colorized image as output. To achieve this, an alignment algorithm must be implemented to rearrange the channels and produce sharp images while reducing the color fringing effect.



The given digitized photographs contain RGB channels arranged from top to bottom. Our first task will be to separate these channels first. We assume that each channel is of the same height, so we split the image vertically in three parts. Once each channel is separated, we need to align them following any of the following two approaches explored for this project. In the image below we show a diagram that represents our goal more clearly.

Our goal is to produce a full color image.

Channel splitting and alignment for glass plate

Exhaustive search

The naive approach to solve this problem is by aligning the channels by brute force. We choose one channel as the base, let’s say, the Blue channel, and then we attempt to align the Red and Green channel to it separately. For both the Red and Green channels, we do the following:

  1. Select a vertical and horizontal displacement window. For this project, the optimal window was found to be [-15, 15] horizontally, and [-20, 20] vertically.
  2. Compute the L2 norm, also known as the Sum of Squared Differences (SSD) distance, which can be represented as:

    Where A represents the displaced channel and B represents the base (fixed) channel.
  3. For every possible horizontal and vertical displacement combination, do the following:
    1. We compute the L2 norm with the base and shifted channels using the current shift.
    2. Compare the new L2 norm with the previous one.
    3. If the new L2 norm is smaller than the previous one, we store the current vertical and horizontal shifts and save the current L2 norm.
  4. 4. After all possible displacements are analyzed, we return the vertical and horizontal displacement values that produced the smallest L2 norm (SSD).

Once we have computed the optimal shift per every channel, we apply it and combine the three channels using Numpy’s dstack function to generate a single multidimensional array representing a colored image. This approach was quite effective for small input images with low resolution. However, if this method was applied to higher resolution images, the runtime was unacceptably slow. For larger images, Pyramid multi-scale signal representation and rescaling was used.

Pyramid multi-scale signal representation & Algorithm

For the high-resolutions photographs, exhaustive search becomes unacceptably expensive as the required displacement windows must be large enough to account for the size of the picture. For these images, a multi-scale signal representation needs to be used. Pyramid, or pyramid representation is a very simple but yet powerful tool that represents images at multiple scales, usually scaled by a factor of two, and the processing is done sequentially top-down, starting from the smallest image and going down the pyramid. To create a pyramid, we first define the maximum pyramid’s height (scaling levels) and then we iteratively scale down the image until reaching the desired level.

The coarser image is located at the top while the original image is located at the bottom of the pyramid.

Visual representation of the Pyramid algorithm (Wikipedia)

Image pyramids allow for a more efficient search, constraining the iterative refinement of the displacement estimate. The initial displacement estimate is the result from an exhaustive search over the lower-resolution channel representations, which are found when reaching the top of the pyramid (coarser, low-resolution image). Then, assuming that the initial estimate is relatively accurate, with each step to a higher-resolution level of the image pyramid, we can re-scale the displacement coming from the level above and we proceed to re-align the channels to refine the estimate effectively. Once the last displacement has been calculated and applied, we combine the aligned channels to produce the color image.

Enhancement & optimization

The Pyramid algorithm worked well, but it took a bit more time than expected and some images showed color fringing. To improve processing speed and improve the alignment quality, the borders of every channel were cropped between 10 to 35%, were the smallest cropping portion was performed at the top of the pyramid and the largest crop was performed in the base of the pyramid. This reduced the size of each matrix representation, thus improving the performance and quality of the final result.

Bells & Whistles: Edge Detection for alignment

successfully aligned since each RGB channel has very different brightness values. Instead of trying to align the channels based on very different brightness colors, we apply the Canny edge detector, which is an edge detection operator that uses a multi-stage algorithm to detect a wide range of edges in images. The Canny edge detection algorithm is composed of 5 steps:

  1. Noise reduction.
  2. Gradient calculation.
  3. Non-maximum suppression.
  4. Double threshold.
  5. Edge Tracking by Hysteresis.

In this project, however, do not derive and implement the algorithm from scratch. Instead, we import it from the scikit-image library. The Canny function has three adjustable parameters: the width of the Gaussian and the low and high threshold for the hysteresis thresholding. To extract the edges, we use the function with the default parameters and save the resulting bi-chromatic image. The edges are represented with white lines. Finally, use these intermediate edge images as input for the pyramid algorithm, which ultimately produced the desired aligned image. Below we show two examples of the intermediate images with different settings.

Default settings detect a lot of fine lines.

Emir red channel with default canny edge detection settings.

This set of parameters only detects high-contrast edges, thus reducing the number of edges.

Emir red channel with the following edge detection settings: sigma=5, low_threshold=0, high_threshold=0.1


The results for all algorithms explained in this section are shown in the following section below.

Exhaustive search results (Low-Resolution images)

Alignment time: 0.7894 seconds.

G layer shift: [5, 2]
R layer shift: [12, 3]

Alignment time: 0.8236 seconds.

G layer shift: [-3, 2]
R layer shift: [3, 2]

Alignment time: 0.8048 seconds.

G layer shift: [3, 3]
R layer shift: [6, 3]
Pyramid multi-scale signal Algorithm (High-Resolution images)

Alignment time: 48.3807 seconds.

G layer shift: [34, 2]
R layer shift: [98, 5]

Alignment time: 46.9097 seconds.

G layer shift: [48, 24]
R layer shift: [59, 11]

Alignment time: 46.4746 seconds.

G layer shift: [59, 17]
R layer shift: [123, 14]

Alignment time: 45.2032 seconds.

G layer shift: [41, 18]
R layer shift: [90, 23]

Alignment time: 43.5857 seconds.

G layer shift: [55, 8]
R layer shift: [112, 12]

Alignment time: 49.6610 seconds.

G layer shift: [82, 9]
R layer shift: [178, 13]

Alignment time: 45.9825 seconds.

Onion church
G layer shift: [50, 26]
R layer shift: [108, 37]

Alignment time: 49.3068 seconds.

Prokudin-Gorsky self portrait
G layer shift: [77, 29]
R layer shift: [175, 37]

Alignment time: 45.2813 seconds.

Three generations
G layer shift: [51, 14]
R layer shift: [110, 12]

Alignment time: 47.7680 seconds.

G layer shift: [43, 7]
R layer shift: [86, 33]

Alignment time: 47.1253 seconds.

G layer shift: [54, 0]
R layer shift: [106, -12]
Bells & Whistles: Edge Detection for alignment

Alignment time: 46.9097 seconds.

Emir (Pyramid)
G layer shift: [48, 24]
R layer shift: [59, 11]

Alignment time: 44.4718 seconds.

Emir (Pyramid + Canny edge detection)
G layer shift: [49, 24]
R layer shift: [106, 41]
Additional images using the Pyramid Algorithm with Edge Detection

Alignment time: 46.8696 seconds.

G layer shift: [12, -6]
R layer shift: [68, 7]

Alignment time: 46.8249 seconds.

Castle walls
G layer shift: [44, 2]
R layer shift: [107, 0]

Alignment time: 48.0639 seconds.

Big cat
G layer shift: [58, 21]
R layer shift: [129, 27]

Alignment time: 48.8679 seconds.

G layer shift: [22, -15]
R layer shift: [99, -31]

Alignment time: 49.2532 seconds.

G layer shift: [66, 15]
R layer shift: [137, 18]

Alignment time: 46.8966 seconds.

G layer shift: [59, -8]
R layer shift: [132, -27]

Alignment time: 49.7626 seconds.

G layer shift: [34, 9]
R layer shift: [117, 18]

Alignment time: 44.7922 seconds.

G layer shift: [34, 23]
R layer shift: [76, 29]

Alignment time: 46.8239 seconds.

G layer shift: [68, 2]
R layer shift: [145, -5]


Final thoughts:

The method of color photography used by Prokudin-Gorsky was very clever, but good results were not possible with the photographic materials available at that time. Even as late as the decade of 1980, making photographic color prints from the Prokudin-Gorsky’s negatives was a very specialized and labor-intensive processes. It was only with the dawn of digital image processing that multiple images could be quickly and easily combined into one just as we demonstrated in this project.

Alignment time: 49.3930 seconds.

Old Emir
G layer shift: [69, 28]
R layer shift: [148, 38]