Project 1 - Image Alignment

Images of the Russian Empire:
Colorizing the Prokudin-Gorskii photo collection

Background

Sergei Mikhailovich Prokudin-Gorskii (1863-1944) [Сергей Михайлович Прокудин-Горский, to his Russian friends] was a man well ahead of his time. Convinced, as early as 1907, that color photography was the wave of the future, he won Tzar’s special permission to travel across the vast Russian Empire and take color photographs of everything he saw including the only color portrait of Leo Tolstoy. And he really photographed everything: people, buildings, landscapes, railroads, bridges… thousands of color pictures! His idea was simple: record three exposures of every scene onto a glass plate using a red, a green, and a blue filter. Never mind that there was no way to print color photographs until much later – he envisioned special projectors to be installed in “multimedia” classrooms all across Russia where the children would be able to learn about their vast country. Alas, his plans never materialized: he left Russia in 1918, right after the revolution, never to return again. Luckily, his RGB glass plate negatives, capturing the last years of the Russian Empire, survived and were purchased in 1948 by the Library of Congress. The LoC has recently digitized the negatives and made them available on-line.

channel

Overview

For this project, I had to align multiple images scanned with slight translational differences for each color channel. The algorithm to do this involves scanning a range of offset values, and selecting the offset with highest similarity score. Both the L2 Norm and the normalized cross-correlation were suggested methods from the project spec. I ended up using the L2 norm because of the ease of implementation, and this generated solid results for the smaller JPG files.

I aligned the red and blue channels to the green channel, instead of aligning both to the blue. My intuition was to do this as the green channel is already in the middle, and it generated good results. Additionally, in the negative (and thus the lens/filter arrangement), green is the middle channel. The hope here is that aligning the other channels to green would lead to smaller (and more accurate) offsets.

To handle larger project files (e.g., the ~70Mb tiff files), I implemented an image pyramid to iteratively refine the offset. I recursively call the offset finding function until a minimum width is obtained for the two comparison images (i.e, red and blue). Then, I scale and return the corresponding optimal offset for that level, until the offset is computed for the full resolution image. This finds reasonable offsets (for most images) to align all three channels in about 50 seconds per 70 Mb file.

To get the time less than 1 minute while retaining good results, I performed a few simplifying steps. First, instead of scaling down the image dimensions by 2 every pyramid, I scaled by 3. This lessened the number of recursive calls to the image alignment algorithm. Additionally, I cropped 28% off of each edge of the image at base of the pyramid. This has the effect of aligning on smaller images, plus reducing noise from the borders. This parameter seemed to affect the speed most directly, which I tuned to run ~50s. Additionally, I crop the images after aligning them to remove the messy black border. A simple dynamic percent offset (9%) from each side works well to clean them up.

c1
c0

Another reflection was the BGR/RGB tedium - I used opencv (cv2) to read and write images. This led to a good set of trial and error for writing correct methods to organize the channel data before eventually getting it right.

c0 c1
c2 c3

Results

JPG Results

input: monastery.jpg
output: rg_offset=[6, 1], bg_offset=[3, -2] c0
input: tobolsk.jpg
output: rg_offset=[4, 1], bg_offset=[-3, -3] c1
input: cathedral.jpg
output: rg_offset=[7, 1], bg_offset=[-5, -2] c2

Speed

Images and offsets

input: emir.tiff
output: rg_offset=[57, 17], bg_offset=[-48, -24] emir
input: melons.tiff
output: rg_offset=[82, 4], bg_offset=[-75, -8] melons
input: three_generations.tiff
output: rg_offset=[59, -3], bg_offset=[-50, -14] three_generations
input: onion_church.tiff
output: rg_offset=[58, 10], bg_offset=[-51, -27] onion_church
input: lady.tiff
output: rg_offset=[59, 4], bg_offset=[-51, -8] lady
input: workshop.tiff
output: rg_offset=[52, -11], bg_offset=[-53, 1] workshop
input: self_portrait.tiff
output: rg_offset=[83, 7], bg_offset=[-72, -27] self_portrait
input: icon.tiff
output: rg_offset=[49, 5], bg_offset=[-41, -17] icon
input: train.tiff
output: rg_offset=[43, 26], bg_offset=[-41, -6] train
input: harvesters.tiff
output: rg_offset=[63, -3], bg_offset=[-59, -17] harvesters
input: church.tiff
output: rg_offset=[33, -8], bg_offset=[-25, -4] church

Additional LoC Images

References: duomo woman tolstoy grass

input: loc_woman.tiff
output: rg_offset=[82, 1], bg_offset=[-26, -3]
input: loc_tolstoy.tiff
output: rg_offset=[36, 1], bg_offset=[-37, -13]
input: loc_duomo.tiff
output: rg_offset: [1= -39], bg_offset: [10,=8]
input: loc_grass.tiff
output: rg_offset=[55, 10], bg_offset=[-39, 1]