CS 194-26 Project 1

Images of the Russian Empire: Colorizing the Prokudin-Gorskii photo collection

Overview

Our goal is to align three different color channels of an image to result in a colorized image. Since the images in Prokudin-Gorskii's collection are recorded on glass plates using three different filters, naive alignment (just stacking the three channels on top of each other) does not produce optimal results.

naively aligned cathedral, created by stacking all three channels

Instead of simply stacking all three channels on top of each other, we must use some image processing techniques to figure out the optimal displacement for each channel so that our resulting image has the least amount of artifacts.

Approach

For small images, I would simply search over a window of displacements that was [-15, 15] by [-15, 15] to find the best displacement. The metric I ended up using to score the displacements was the normalized cross correlation, but I also tried sum of squared differences, which yielded similar results.

For larger images, I built an image pyramid by rescaling the original image by a factor of 2. I would search for an optimal displacement in this downsampled image and use this coarse displacement to help me find a finer tuned displacement in the lower levels of the pyramid. This speeds up computation when dealing with large images that have displacements that are potentially greater than [-15, 15].

Problems I ran into, and my solutions

The approach described above yields decent results but fails on some images. Some images, like monastery.jpg, seemed to fail because the borders were interfering with the computation of the normalized cross correlation. I fixed this issue by simply pre-cropping 10% off of each image's border so that the correlation would not be skewed by the results.

For the image pyramid, I also had to tinker with some parameters to make sure I wasn't searching too large of a section of the image. I ended up using 500 pixels as my threshold for the smallest layer of the pyramid, and recursed down until I reached 3000 pixels.

Results

Small images

cathedral
G: (5, 2), R: (12, 3)

monastery
G: (-3, 2), R: (3, 2)

nativity
G: (3, 1), R: (7, 0)

settlers
G: (0, 7), R: (-1, 14)

Large images

Using the image pyramid seemed to yield decent results for all of the images except emir.tif. This is probably because the three channels have very different brightness values, which distorts the correlation. To fix the emir image, I calculated the alignment based on the green channel instead of the blue one. One other way to solve this is through feature detection, which I implemented in the bells and whistles section.

emir (alignment based on blue channel)
G: (48, 24), R: (72, 42)

emir (alignment based on green channel)
R: (16, 56), B: (-24, -48)

harvesters
G: (16, 60), R: (14, 124)

icon
G: (40, 16), R: (90, 22)

lady
G: (52, 8), R: (112, 12)

self portrait
G: (28, 78), R: (36, 176)

three_generations
G: (52, 14), R: (112, 10)

train
G: (42, 4), R: (86, 32)

turkmen
G: (20, 56), R: (28, 116)

village
G: (64, 12), R: (138, 22)

Other Images

Here are some other images I selected from the Prokudin-Gorskii photo collection.

crew
G: (2, 2), R: (5, 3)

onion_church
G: (52, -26), R: (108, 36)

melons
G: (82, 10), R: (178, 12)

workshop
G: (52, 0), R: (104, -12)

monument
G: (18, -12), R: (114, -28)

boat
G: (12, -6), R: (132, -12)

Bells and Whistles

Edge detection

I used a Sobel filter for edge detection, which allowed me to get some good results on the emir image without needing to switch the color channel I used as a base. Here are some photos of the Sobel edge detection output on various photos:

emir edges

emir without edge detection

emir with edge detection
G: (24, 48), R: (42, 106)

harvesters edges

harvesters without edge detection

harvesters with edge detection
G: (16, 62), R: (12, 124)

cathedral edges

cathedral without edge detection

cathedral with edge detection
G: (2, 5), R: (3, 12)