Images of the Russian Empire:

Colorizing the Prokudin-Gorskii photo collection

Ron Wang

CS 180, Fall 2023, UC Berkeley

Overview

In this assignment, I take the digitized Prokudin-Gorskii glass plate images and, using image processing techniques, automatically produce a color image with as few visual artifacts as possible. The three color channel images were extracted, aligned, and placed on top of each other to form a single RGB color image.

The end result: visually stunning color images.

Approach

Alignment Metrics

Naturally, we need some kind of metric to measure how well two color channels align with each other. The two metrics I explored in this project are:

  • Sum of Squared Differences (SSD)
  • Normalized Cross-Correlation (NCC)
  • I wrote additional helper functions to perform chores such as cropping the images (to remove uninteresting borders) and applying the Sobel operator (to see if edge detection is useful).

    Exhaustive Search

    In the very beginning, I tried exhaustively searching over a window of possible displacements using a window size of 15. I would compute the alignment score (using one of the aforementioned metrics) and keep track of the best alignment vector encountered so far. This was a naive implmentation in that although it worked well for small JPEG images (under 200 KB), it was computationally infeasible for the large TIFF images (around 70 MB) to finish.

    Constructing an Image Pyramid

    A more efficient way to align the color channels is through the use of an image pyramid. An image pyramid is a hierarchical way of representing an image at different scales, where the top levels are smaller images that have been blurred and subsampled from the original.

    "In a Gaussian pyramid, subsequent images are weighted down using a Gaussian average (Gaussian blur) and scaled down. Each pixel containing a local average corresponds to a neighborhood pixel on a lower level of the pyramid." --Wikipedia

    As required by the project, I implemented an image pyramid without using existing high-level implementations. Specifically, I used recursion to reach the top level of the pyramid, where I would perform an exhaustive search on the image (which is a much smaller version of the original) and pass the alignment estimate one level down the pyramid.

    This led to great runtime improvements. For a window size of 5, whereas the naive approach takes more than 2 minutes (and often much longer), the image pyramid can finish within 15 seconds.

    (Auto-)Cropping is All You Need!

    Aligning some color channels was initially difficult, even when I set the window sizes to high values, such as 15 pixels. Through experimentation, I soon realized that the major culprit was the borders surrounding the images. Since the images were scanned, many contained blemishes and handwriting, effectively introducing noise as the images were aligned. Thus, my intuition is that removing borders more accurately will help better color channel alignment.

    To this end, I implemented a default cropping function, as well as an automated version that searches for the largest connected region using Otsu's method. I used the default cropping function to roughly remove the borders and then employed auto-cropping to fine-tune the result.

    Handling Edge Cases

    Although the pipeline now works efficiently and achieves good results on most images, it struggles with three inputs: emir, melon, and sculpture. I investigated the images more closely and found (r, b) was aligned differently from (g, b). To solve this, I made one simple fix: to align the color channels to the green channel instead. This way we are computing the alignment vectors for (b, g) and (r, g).

    This turned out to be successful. Not only did it solve the edge cases, but it also worked well on all other images. I think the reasoning behind this is if we start off with an incorrect distribution, then it will be very hard for the alignment to be successful.

    Results

    Examples Images

    Note: Offsets are listed as (y_offset, x_offset).

    Cathedral.jpg

    cathedral.jpg

    Offsets for (g, b): [-5, -2]

    Offsets for (r, b): [7, 1]

    Church.tif

    church.jpg

    Offsets for (g, b): [-25, -4]

    Offsets for (r, b): [33, -8]

    Emir.tif

    emir.jpg

    Offsets for (g, b): [-49, -24]

    Offsets for (r, b): [57, 17]

    Harvesters.tif

    harvesters.jpg

    Offsets for (g, b): [-60, -16]

    Offsets for (r, b): [65, -3]

    Icon.tif

    icon.jpg

    Offsets for (g, b): [-40, -17]

    Offsets for (r, b): [48, 5]

    Lady.tif

    lady.tif

    Offsets for (g, b): [-53, -8]

    Offsets for (r, b): [63, 3]

    Melons.tif

    melons.jpg

    Offsets for (g, b): [-82, -9]

    Offsets for (r, b): [96, 3]

    Monastery.jpg

    monastery.jpg

    Offsets for (g, b): [3, -2]

    Offsets for (r, b): [6, 1]

    Onion_Church.tif

    onion_church.jpg

    Offsets for (g, b): [-51, -26]

    Offsets for (r, b): [57, 10]

    Sculpture.tif

    sculpture.jpg

    Offsets for (g, b): [-33, 11]

    Offsets for (r, b): [107, -16]

    Self_Portrait.tif

    self_portrait.jpg

    Offsets for (g, b): [-79, -29]

    Offsets for (r, b): [98, 8]

    Three_Generations.tif

    three_generations.jpg

    Offsets for (g, b): [-54, -11]

    Offsets for (r, b): [58, -1]

    Tobolsk.jpg

    tobolsk.jpg

    Offsets for (g, b): [-3, -3]

    Offsets for (r, b): [4, 1]

    Train.tif

    train.jpg

    Offsets for (g, b): [-43, -5]

    Offsets for (r, b): [43, 27]




    Chosen Examples from Prokudin-Gorskii Collection

    School.tif

    School in the village of Pidma named after His Imperial Majesty, Sovereign, Heir Apparent, Crown Prince, Grand Duke Aleksei Nikolaevich. [Russian Empire]

    school.jpg

    Offsets for (g, b): [-26, -9]

    Offsets for (r, b): [36, 0]

    Monument.tif

    City of Lodeinoe Pole. Monument to Emperor Peter the Great. [Russian Empire]

    monument.jpg

    Offsets for (g, b): [-22, -21]

    Offsets for (r, b): [36, 8]

    Study.tif

    Ostrechiny. Study. [Russian Empire]

    study.jpg

    Offsets for (g, b): [-13, 5]

    Offsets for (r, b): [120, -6]

    Group.tif

    Group of children. [Russian Empire]

    group.jpg

    Offsets for (g, b): [-66, -35]

    Offsets for (r, b): [77, 17]

    Sawmill.tif

    View of the sawmill. Kovzha. [Russian Empire]

    sawmill.jpg

    Offsets for (g, b): [-15, -22]

    Offsets for (r, b): [42, 15]

    Machine.tif

    Stone-excavating machine of the multi-scoop type "Svirskaia pervaia." [Russian Empire]

    machine.jpg

    Offsets for (g, b): [-33, -2]

    Offsets for (r, b): [34, -21]

    Bells & Whistles

    Automatic Cropping

    I implemented an automatic cropping function that takes in the three color channels and returns cropped versions of them. The cropping is based on the thresholding of the red channel using Otsu's method to segment the image into significant regions, and then using the bounding boxes of the two largest regions to determine the cropping boundaries.

    This mechanism is combined with the default cropping function to achieve optimal results. In the following example, notice how automatic cropping leads to more a more optimal alignment vector for (r, b).

    Before

    cathedral_no_autocrop.jpg

    Offsets for (g, b): [-5, -2]

    Offsets for (r, b): [0, 1]

    After

    cathedral.jpg

    Offsets for (g, b): [-5, -2]

    Offsets for (r, b): [7, 1]

    Before

    cathedral detail before

    After

    cathedral detail after

    Automatic Contrasting

    I also implemented automatic contrasting to map the darkest pixel to zero and the brightest pixel to (on its brightest color channel). This serves as a gentle way to rescale image intensities and improve image quality. Overall, images appear more natural and pleasant.

    Before

    church without contrast

    After

    church with contrast

    Before

    onion church without contrast

    After

    onion church with contrast