CS 194-26: Image Manipulation and Computational Photography

Project 1: Colorizing Images of the Russian Empire

William Tait

Fall 2017



Overview

In the early 1900’s Sergei Mikhailovich Prokudin-Gorskii traveled across Russia taking pictures of various lifestyle and landscape scenes. In one of the first attempts to capture color photographs, he captured each scene through a red, green, and blue filter onto a glass plate. These plates were later acquired by the Library of Congress, and were laboriously aligned and touched up by hand to re-create the original color scenes. This project attempts to speed up that process by automatically aligning the color channels using various computational methods.

Approach

Alignment

The naive “single-scale” approach compares the color channels pairwise and calculates the best displacement of one image in a square window (ex. [-15, 15] pixels in each dimension) against the other using a distance function. The Blue channel was held as a reference, against which the Green and Red channels were tested for displacement. The base approach uses the Sum of Squared Differences metric (SSD) to compute image similarity. If the 2 channels overlap perfectly, this sum is zero, and so the objective function becomes finding the optimal displacement vector in the specified window that minimizes this “score” for the test image overlaid at that displacement over the reference image. Here is the equation for SSD for reference:

Image Pyramid

The images are simply stored as a 2D matrix of doubles that is shifted and summed for SSD calculations. This works for small .jpg inputs (< 1 second), but many of the input images are large .tif files (> 70 MB) and the exhaustive window search on a single image quickly becomes problematic. To address this issue, large images are aligned in a recursive manner known as an image pyramid (this picture explains the name). To start at the top of the pyramid the image is scaled down several times, usually by powers of 2, so that a rough alignment can be quickly found with this smaller, less detailed image. As the image is scaled back up, cumulative alignment is carried over and refined; this makes clear that the main advantage of an image pyramid is a much smaller alignment window can be used, provided the pyramid contains enough layers (the alignments below were generated using a 4-layer pyramid, although to generalize the procedure some type of mapping could be made from image size to number of layers).

This makes clear that the main advantage of an image pyramid is a much smaller alignment window can be used. The suggested window for the single-scale was [-15, 15] pixels, but finding the right window is a tough balancing act: Too large and the computation time increases rapidly, but if the window is too small then the space of alignments to search may not include the optimal alignment. But after some experimenting I found a window size of 7 pixels allowed my image pyramid to align all but 1 input image. The window size was one of 2 free parameters to my algorithm, the other being the depth of the pyramid (the alignments in this report were generated using a 4-layer pyramid, but to generalize the procedure some type of mapping could be made from image size to number of layers, etc.).

Results

cathedral.jpg:
G(5, 2) R(12, 3)
monastery.jpg:
G(-3, 2) R(3, 2)
nativity.jpg:
G(3, 1) R(7, 0)
settlers.jpg:
G(7, 0) R(15, -1)
emir.tif:
G(49, 24) R(0, -171)
harvesters.tif:
G(60, 17) R(124, 14)
icon.tif:
G(41, 17) R(89, 23)
lady.tif:
G(53, 9) R(114, 12)
self_portrait.tif:
G(53, 9) R(114, 12)
three_generations.tif:
G(52, 14) R(112, 12)
train.tif:
G(42, 6) R(87, 32)
turkmen.tif:
G(56, 21) R(116, 28)
village.tif:
G(64, 12) R(138, 22)

Additional Images

On top of the images required for the project, here are some other images from the Prokudin-Gorskii I personally chose to showcase my algorithm.

woodshop.tif:
G(r47, 21) R(107, 33)
flowers.tif:
G(16, -3) R(118, -14)
balcony.tif:
G(38, 21) R(76, 35)
courtyard.tif:
G(2, 10) R(40, 16)
rooftop.tif:
G(48, 10) R(103, -6)
shoreline.tif:
G(10, 15) R(131, 28)

Problems and Failures

Due to the nature of the way the color channels were captured on glass plates the greyscale images have uneven borders which obfuscate the SSD calculations. A quick fix for this issue was to simply crop the outer pixels and only use the inner 2/3 of the image for distance calculations. (see below for auto cropping if I get to it). In fact, for a while I was not cropping enough and my alignment results suffered, so I began experimenting with difference alignment metrics before simply cropping more pixels to ensure the borders were not being considered in the alignment process.

Bells and Whistles

Problem: Brightness Differences

The only image in the test set that did not align properly was emir.tif. Looking at the individual color channels, the man’s robe is clearly blue as it is all white in the blue channel and all black in the red channel. This discrepancy between the image’s representation in each channel makes alignment with SSD unfeasible, as even after cropping the border defects there are still major dissimilarities in the center of the picture where it actually should align.

emir.tif blue channel
emir.tif red channel

Solution: Align with Image Gradient

The image gradient quantifies difference in intesity in the image. This metric shows less dissimilarities between color channels and so makes for an easier job when it comes to computing alignments with SSD. Interestingly, this issue was also fixed by choosing the Green channel as the "reference" image in the alignment process rather than the blue. For this image this makes complete sense, as the blue channel is all white in the robe, and the red channel is all black, but the green channel serves as more of a middle ground for the middle two.