CS 194-26: Intro to Computer Vision and Computational Photography

Project 1: Images of the Russian Empire -- Colorizing the Prokudin-Gorskii Photo Collection

Yukai Luo

3034106222



Background

Sergei Mikhailovich Prokudin-Gorskii (1863-1944) [Сергей Михайлович Прокудин-Горский, to his Russian friends] was a man well ahead of his time. Convinced, as early as 1907, that color photography was the wave of the future, he won Tzar's special permission to travel across the vast Russian Empire and take color photographs of everything he saw including the only color portrait of Leo Tolstoy. And he really photographed everything: people, buildings, landscapes, railroads, bridges... thousands of color pictures! His idea was simple: record three exposures of every scene onto a glass plate using a red, a green, and a blue filter. Never mind that there was no way to print color photographs until much later -- he envisioned special projectors to be installed in "multimedia" classrooms all across Russia where the children would be able to learn about their vast country. Alas, his plans never materialized: he left Russia in 1918, right after the revolution, never to return again. Luckily, his RGB glass plate negatives, capturing the last years of the Russian Empire, survived and were purchased in 1948 by the Library of Congress. The LoC has recently digitized the negatives and made them available on-line.

Overview

The goal of this project is to take the digitized Prokudin-Gorskii glass plate images and, using image processing techniques, automatically produce a color image with as few visual artifacts as possible. In order to do this, we will need to extract the three color channel images, place them on top of each other, and align them so that they form a single RGB color image.

Part 1: Exhaustive search

The easiest way to align the parts is to exhaustively search over a window of possible displacements (we used [-15,15] pixels in this project), score each one using the L2 norm, which is also known as the Sum of Squared Differences (SSD) distance. The formula is simply sum(sum((image1-image2).^2)) where the sum is taken over the pixel values. Another is normalized cross-correlation (NCC), which is simply a dot product between two normalized vectors: (image1./||image1|| and image2./||image2||).

I used green channel as the reference and align red and blue channel agaisnt it because human eyes are more sensitive to green. This gives better results, especially with `emir.tif` file in multiscale processing below.

Shown below are the results of aligning the jpg files using the displacement found through exhaustive search:


cathedral offsets: R: [1 7] B: [-2 -5]
monastery offsets: R: [1 6] B: [-2 3]
tobolsk offsets: R: [1 4] B: [-3 -3]

Part 2: Image Pyramid

Exhaustive search will become prohibitively expensive if the pixel displacement is too large (which will be the case for high-resolution glass plate scans). In this case, we implemented a faster search procedure called image pyramid. An image pyramid represents the image at multiple scales (by a factor of 2 for this project) and the processing is done sequentially starting from the coarsest scale (smallest image) and going down the pyramid, updating your estimate as you go. The image pyramid is generated by iteratively downscaling the image until the total amount of pixel is lower than an artificial threshold. We set this threshold to be about the size of a jpg image. Because the exhaustive search method works pretty well on the jpg files, having the first image to be a jpg file size guarantees the performance of the algorithm. We implemented by adding recursive calls to the original single-scale implementation, with small changes to accommodate different starting index of search window.

Shown below are the tif files after aligning using image pyramid technique:






church offsets: R: [-8 33] B: [ -4 -25]
emir offsets: R: [17 57] B: [-24 -48]
harvesters offsets: R: [-3 65] B: [-17 -59]
icon offsets R: [5 49] B: [-17 -41]
lady offsets: R: [4 61] B: [-8 -55]
melons offsets: R: [3 96] B: [-10 -82]
onion_church offsets: R: [10 57] B: [-27 -51]
self_portrait offsets: R: [8 98] B: [-29 -78]
three_generations offsets: R: [-3 59] B: [-14 -52]
train offsets: R: [26 43] B: [-6 -43]
workshop offsets: R: [-11 52] B: [1 -53]