Project 1 - Images of the Russian Empire

By Nithin Chalapathi

Project Overview - The main goal of this project is to colorize images taken of Russia in the early 20th century by Sergey Prokudin-Gorsky. His conviction that color images were the future lead him to take three photographs of the same scene using red, green, and blue filters. Given three grayscale images taken using a red, green, and blue filter, the project aims to combine the three into a color image. My approach involves:

Using the normalized cross-correlation to align color channels.
Using an image pyramid for larger images instead of an exhaustive search.
Ignoring the borders of each color channel.

For each implementation point, I first describe the approach and thought process. I conclude with some closing remarks.

1. Normalized Cross Correlation - During the single-scale implementation, I decided to use both the sum of squared distances and normalized cross correlation metric to see which performed better. Visually, the NCC seems to perform slightly better when edges of each channel are used to compute the similarity metric between channels. As I implemented the multi-scale and edge exclusion, they performed almost identically in my testing. For the sake of simplicity, I only use the NCC but the functionality for the sum of squared distances is still in the code. For the single-scale implementation, an exhaust search of [-15, 15] pixels for both height and width are used.

2. Image pyramid - As to be expected, the single-scale implementation is far too slow to do an exhaustive search on the large tif images. As a result, I implemented an image pyramid. The pyramid uses a scale factor of 2 when creating each successive layer. In my brief testing, it seemed that going beyond a certain resolution didn't yield much visual difference. Thus, the base case for my pyramid is when one dimension (height or width) of the image reach below 600 pixels. After which, the program computes the best possible channel alignment and recurses up to the finer representation. Similar to the single scale implementation, at each level of the pyramid an exhaustive search of [-15, 15] pixel alignment for both height and width is performed.

3. Ignoring borders - As I was testing out my code, I noticed that some images performed poorly in the alignment phase (notably emir.tif). I hypothesized that this might be due to the edges of the image. For each image, I ignored 10% of the border on all sides. I found that this was a sufficient offset to ignore the artifacts on the edges of the color negatives.

Implementation notes - As I developed the multi-scale representation, I scrapped my single-scale implementation. The only images I used for the single-scale implementation were the small JPEG images, all of which had dimensions less than 600 x 600. I was able to reduce the amount of code required for all the images as a result. To shift the channels, I used Numpy.roll.

Further Work - One place for improvement is the performance of the program. While the average amount of time per image on my computer is roughly 1 minute, I believe the program could be accelerated. It might help to have the image pyramid go to a much finer level and search over a smaller displacement window. It may also help to have a smarter scaling factor (i.e. one that was adjusted based off of the level's current size).

Results - The naive image is on the top and the output of my program is on the bottom. The displacement vectors for red and blue are below each image (width offset, height offset).

emir.tif

Unedited Emir