CS 194-26 proj1: Images of the Russian Empire

Ellen Hong

Overview

Sergei Mikhailovich Prokudin-Gorskii was a Russian photographer and a pioneer of color photography. Beginning in 1907, he began to document the Russian empire by taking photographs of everything he saw. For each scene he photographed, he captured three glass-plate images using one red, one blue, and one green filter.

Prokudin-Gorskii unfortunately never got to see the colorized results of his images, though all of his glass plate negatives have survived over the years. In this assignment, we attempt to reproduce the color photographs Prokudin-Gorskii intended to create, by aligning the three different color channel images on top of each other so that they form color photos.

Single-Scale Alignment

Perhaps the most essential step of producing these color images is aligning each of the R, G, and B channel negatives on top of each other. I started with the more naive solution of exhaustively searching over all possible displacement vectors in the range [-15, 15]. In my implementation, I decided to leave the blue (B) channel fixed, while aligning the G and R channels on top of it. I used two different image matching metrics to quantify how well the different channels were aligned:

Sum of Squared Differences (SSD)

In this case, I took the displacement vector with the smallest SSD value, where the SSD is defined as the squared difference between, for example, a pixel in the blue channel and a pixel in the red channel, summed over all pixels in the image.

Normalized cross-correlation (NCC)

Using the NCC metric, I wanted the displacement vector with the largest NCC, where NCC is the dot product between two normalized vectors: one representing the pixel values of one channel, the other representing the other channel's pixels.

Results

I ended up achieving the exact same images and displacement values for each of the smaller-scale (.jpg) images using both types of metrics. The results are as shown:

cathedral
G: (5, 2), R:(12, 3)
monastery
G: (-3, 2), R: (3, 2)
nativity
G: (3, 1), R: (7, 0)
settlers
G: (7, 0), R: (15, -1)

Multi-Scale Alignment

While single-scale alignment works perfectly fine on smaller images (images above were < 400px in width), the algorithm runs very slowly on larger images (> 3000px in width). Instead, we attempt to speed up this process by using an image pyramid.

Image Pyramid

Instead of performing exhaustive search over all possible displacements on the full-size image, using an image pyramid allows us to reduce the amount of computation. An image pyramid searches for the best alignment on a much smaller scaled version of the image (i.e. 256px), and then uses that displacement value as a starting point for alignment on a slightly larger image, and so on until we reach the full-size image. This approach allows us to only have to search the full [-15, 15] range of displacements on the smallest scale, then search within a smaller limit of displacements on larger images. I took the displacement limit at each level to be (2 + current level), where level 0 is the full-size image, level 1 is half the full-size image, etc.

Note: Although this method worked fine for most of the images, emir.tif and village.tif were not perfectly aligned, as is visible in the results below. This is because these two glass plate images specifically had R, G, and B channels with particularly divergent intensity values: eg. in emir.tif, the man's robe is dark in the blue channel and light in the red channel. This caused both the SSD and NCC metrics to be inaccurate, so we didn't get a perfect alignment.

emir
G: (49, 24), R: (0, 43)
harvesters
G: (59, 17), R: (123, 14)
icon
G: (41, 17), R: (89, 23)
lady
G: (53, 9), R: (114, 11)
self portrait
G: (78, 29), R: (176, 37)
three generations
G: (52, 14), R: (110, 11)
train
G: (42, 6), R: (87, 32)
turkmen
G: (55, 20), R: (116, 28)
village
G: (0, -43), R: (137, 22)