CS194 Project 1:

Colorizing the Prokudin-Gorski Photo Collection

Ahmed Malik || Spring 2020

Overview

This project explores the challenges and (some) solutions to colorizing an image using individual red, green, and blue channels. The input is an image that contains 3 pictures stacked one above each other in vertical space, each a seperate, grayscale photograph taken with either a red, green, or blue filter. These individual color channels can be aligned on top of each other to form a single, colorized image.

Unaligned color channels

All 3 color channels aligned for a colorized image!

The 3-stacked images come from the Prokudin-Gorski photo collection. In the early 1900s, Gorski attempted color photography by taking 3 photos of the same object, but each time using a different colored glass filter: red, green, and blue. However, because 3 seperate photographs had to be taken of the same scene/object, the negatives are not perfectly aligned. This, as well as other factors like potential differences between the 3 glass filters themselves, means that simply stacking the negatives atop one another does not yield a perfect image. Thus, this page outlines how I aligned the images.

Approach: Single-Scale

My solution uses the Sum of Squared Differences (SSD) to determine a metric for how aligned two color channels are. Using the blue channel as the base, I perform an SSD between the blue and red images, and the blue and green images, for each possible combination of horizontal and vertical shifts of the red or green images within some displacement range. For smaller images, I found a displacement of +/-15 to be a good range. I save the displacement that produces the minimum squared difference and return it after the search. Using np.roll() and the minimized difference displacements, I shift the green and red images accordingly.

Initially, only the image of tobolsk had a decent alignment; cathedral was ok, and monastery had a phantom effect on the buildings. After digging on Piazza and talking with GSIs, I was recommended to try and crop some percent of the images to remove the borders. This would help focus the alignment on just scenes in the photos themselves, rather than dealing with effects due to Gorski's imperfect (but understandbly so) method of capturing the same scene 3 times using different filters. After playing around with some numbers, I found a crop of 5% to work very well for these smaller images (and it carried over well to larger images too).

So with a simple alignment minimizer using SSD and cropping, I was able to get cathedral, tobolsk, and monastery to look very good. Trying this single-scale method on any of the larger, .tif files was impossible though; it would simply take too long, and a displacement of +/-15 was insufficient for images so large. Hence, upon recommendation from the project spec, I moved to the multi-scale approach to implement an image pyramid.

Approach: Multi-Scale (Image Pyramid)

An image pyramid begins with the orignal sized base and image-to-be-aligned, but it reduces the size of the images at every depth of the pyramid until the images reach a certain size. This allows us to perform an exhaustive SSD alignment on a small, coarse image at the top of the pyramid, and looser alignments at the subsequent levels, significantly increasing the time to align the two images while not really losing results.

I implemented my image pyramid with a recursive structure that downscaled my images until they were less than 400x400 pixels squared, approximately the size of the smaller, .jpg example images provided to us. At this base level, the image pyramid would perform an alignment using a displacement range of +/-40 pixels. I found that increasing the displacement from +/-15 to +/-40 for the larger images was useful and didn't hamper performance significantly because it was only run once, on a small image. At all other levels, I shifted the images between +/- 2 pixels in the horizontal and vertical directions. This, coupled with the same cropping of 5% of the edges, resulted in (mostly) good alignments.

Note that the base case for my image pyramid is the same as for the single-scaled approach, except for the different displacement range. I tested the smaller images through the image pyramid approach and they aligned well; no rescaling was done in these cases since the images were already smaller than 400x400 pixels squared. For the purposes of my submission, and to showcase both the single-scale and multi-scale methods separately, my code uses the single-scale approach for the smaller images and the image pyramid for the larger images.

Also -- this applies for both single-scale and multi-scale -- but in order to speed up the procesing by just a little bit, I decided to convert my images to float32 rather than float64. I noticed a ~10 second speed-up in processing the .tif images as a result, with no visible change to the aligned images.

Results From Example Images

Here I present the results of my image alignment on the set of images provided alongside the project. The first 3 (cathedral, monastery, and tobols) are aligned using the single-scale method, though the image pyramid would give the same result. All other images are aligned using the image pyramid. See main.py for further details.

cathedral.jpg