Project 1 - Images of the Russian Empire

CS 194-26 Fall 2018

By Tao Ong

Project description

In this project, we take grayscale digitized Prokudin-Gorskii glass plate images and produce colorized versions of them. This was done by extracting three color channel images and placing them on top of each other to form a single RGB color image.

Naive Image Alignment

After dividing the image into three equal parts, these parts could not immediately be placed on top of each other as they would be misaligned. To tackle this, my initial approach was to compare two channels against one (in my case I compared green and red with blue) and exhaustively search over a window of possible displacements (say [-15,15] pixels), score each possible displacement using some image matching metric, and finally take the displacement with the best score.

There were two options to use for the image matching metric, sum of squared differences (SSD) and normalized cross-correlation (NCC). I ended up trying both, but sticking with the latter as it produced slightly better results in terms of alignment. One thing I had to look out for was that the SSD metric favored lower values, whereas NCC favored greater values.

Here are the results of this naive alignment solution on the smaller images, along with the displacements used for the green and red channels:

R: (12, 3), G: (5, 2)

R: (3, 2), G: (-3, 2)

R: (7, 0), G: (3, 1)

R: (14, -1), G: (7, 0)

Alignment for Larger Images

The naive image alignment process shown above worked well for smaller images, but for the larger .tif files the code was far too inefficient to be a feasible solution. In order to tackle this, I adopted the image pyramid method, which takes the idea of repeatedly shrinking the image by a scale of 2 into some arbitrarily small size, applying the same naive alignment method described above (using NCC as a metric), then rescaling the image back to its original size while fine-tuning the initial result of the base naive alignment with a small [-1, 1] displacement window.

Here are the results of the image pyramid method on the larger provided images, along with the displacements used for the green and red channels:

R: (107, 40), G: (49, 24)

R: (124, 13), G: (60, 18)

R: (90, 23), G: (41, 17)

R: (120, 13), G: (57, 9)

R: (175, 37), G: (78, 29)

R: (111, 8), G: (54, 12)

R: (85, 29), G: (41, 1)

R: (117, 28), G: (57, 22)

R: (137, 21), G: (64, 10)

Additional Large Images

I also ran the image pyramid method on a few extra large .tif images:

R: (33, -68), G: (10, -31)

R: (32, 29), G: (20, 24)

R: (90, 23), G: (41, 17)

Bells and Whistles

Despite the improvement efficiency of the image pyramid method, its results on images with dominant color channels were still sub-par. I resolved this alignment bias by using edge detection, with help from the roberts function taken from the skimage library.

I also found out that I could achieve better results by cropping out around 0.1 of the image before displacing the color channels, as without this the border would interfere with the edge detection.

To illustrate the effects of this edge detection + cropping approach, here are the results with and without it. I used the extra image as an example because of its predominantly blue hues:

R: (52, -6), G: (0, -5)

R: (58, -4), G: (25, 4)