Images of the Russian Empire

CS194-26: Intro to Computer Vision and Computational Photography

Author: Michael Sparre

Project Overview

Using the famous digitized collection of Sergei Mikhailovich Prokudin-Gorskii's glass plate images of Russia, this project utilizes image processing techniques to produce vibrant, color images. Prokudin-Gorskii's images are separated into 3 R, G, B channels that must be properly aligned and stacked to recreate the colored photograph he took.

Approach

For the images we were working with, there were two main types of photos: .jpg's and .tif's. The jpg files were much smaller than the tif files (e.g. 170KB -> 70MB). For the all photos, I attempted to align the R channel and G channel to the B channel (anchor channel).

My method of alignment for jpg files went as follows: look at a smaller area of the R/G/B channel photo (I used the central third), and shift the R/G channel (i,j) pixels of a displacement radius (I used 15 pixels as my default). Once shifted, subtract this smaller image from the B channel and calculate the Sum of Squares Difference (SSD). The pixel shift that corresponds to the lowest SSD gives the best alignment of the channel on top of B, so translate the whole R/G channel by that pixel offset.

My method of alignment for tif files was slightly different. Because there were so many pixels, an exhaustive search of a window displacement would have been too costly in terms of runtime. Instead, I implemented a multiscale, image pyramid approach. Essentially, I rescaled the original image down by a factor of 1/2, 5 times. Once at the lowest level (where there is only a couple hundred pixels per width and height), I use the alignment procedure from above that was used on jpg files to return the best alignment vector for the scaled down photos. From there, I go a level up in the image pyramid and start my alignment vector possibilities where the previous level left off (essentially the lower levels are giving estimates to where the upper levels should be looking). I actually multiply the previous level's estimate by a factor of 2 to increase the search to its desired alignment vector since it takes some momentum to achieve larger offsets (150+ pixel alignments in the x or y direction). The image pyramid approach still takes longer than the naive approach used on jpg's but the images are still aligned in much less than a minute per tif on my computer.

Problems I faced

The first problem I faced, and probably the longest, was figuring out that numpy arrays are indexed vertically first and then horizontally. It's been awhile since I worked with python, and while it makes complete sense now, I probably spent 10+ hours confused on why my displacement vector (and resulting colored image) was off every time even though I was getting what seemed to be the best offset. Once I switched the x and y variables for the indexing, I could finally see the beautiful colored images in their surprising great quality.

Another problem I faced during the image pyramid portion of the project was not being able to reach high displacements. My program wasn't able to reach displacements of 100+ pixels in its first iteration because the image pyramid couldn't deviate that far in only 5 steps (where the first two steps only produced deviations of 1-6 pixels). I combatted this by multiplying the estimates by 2 after completing a level in the image pyramid in order to help the estimates gain momentum in their direction towards finding the best alignment vector on the original image. After doing this, I could reset my search size from (-30, 30) to (-15, 15) and still achieve optimal results.

Additionally, I had issues with some photos aligning R to B so for the default examples I found that aligning R to the newly aligned G channel worked best. However, for the extra input photos I processed, aligning R and G to B worked best so I did that manually by swapping the arg to pyramid_align from the B channel to the newly aligned G channel.

Final Photos

The final processed photos are exhibited below with the title of the photo in bold above and the RGB offsets used on the channels captioned below. Here is a link to a .txt file containing the name of the image and the offsets used on the image as well as the execution time to process the image.

jpg photos

cathedral

g: (2, 5) r: (3, 12) b: (0, 0)

monastery

g: (2, -3) r: (2, 3) b: (0, 0)

tobolsk

g: (3, 3) r: (3, 7) b: (0, 0)

tif photos

church

g: (4, 25) r: (-4, 58) b: (0, 0)

emir

g: (24, 48) r: (41, 106) b: (0, 0)

harvesters

g: (17, 59) r: (15, 123) b: (0, 0)

icon

g: (18, 41) r: (23, 90) b: (0, 0)

lady

g: (8, 53) r: (12, 114) b: (0, 0)

melons

g: (9, 82) r: (13, 178) b: (0, 0)

onion church

g: (27, 50) r: (37, 108) b: (0, 0)

self portrait

g: (29, 78) r: (37, 176) b: (0, 0)

three generations

g: (14, 51) r: (11, 110) b: (0, 0)

train

g: (7, 43) r: (33, 86) b: (0, 0)

workshop

g: (-1, 53) r: (-12, 105) b: (0, 0)

extra tif photos

armenian women

bukhara

fish

flower vase

gospel vessels

iconostasis

railroad participants

steamboat fire

venerable monastery

white fox