Image Alignment and Colorizing Project

Colorizing the Prokudin-Gorskii photo collection

CS 194-26: Computational Photography, Fall 2020

Avni Prasad, cs194-26-aej

Sergei Mikhailovich Prokudin-Gorskii (1863-1944) was a daring photographer ahead of his time who envisioned ways to capture color in this world prior to the invention of color photography. Recieving the Tzar’s special permission to travel across the Russian Empire, he took greyscale photos of every scene using three exposures on a glass plate using red, green, and blue filters. While at that time there was no way to witness the color in those photos, today we are able to use these photos saved by Library of Congress to restruct the colors behind these beautiful photos.

Overview

This project stitches and aligns the red, green and blue exposures together to re-create the scene in color. We start off with an image like this:

Here, the three seperate images use glass plates with red, green, and blue filters. By overlaying these photos on top of each other with their respective rgb value, we can bring back the color in the photo.

However, while these three photos were taken at the same time, there were not taken from the exact same position. Given this, we must re-align the photos to sharply represent the image as it was seen through one’s eyes. This is the last step, where we would hope for an end product that represents a colored version of the photo

Strategy

Walking through my strategy, I will use be using the lady.jpg image as an example.

Step 1: Seperate color channels

Starting from the original image composed of the three seperate color channels, the first step is to split the original image into 3 seperate images corresponding to the red, green, and blue channel. By looking at the height of the original image, we can split the image into 3 equally to get the 3 color channels

Red channel

Green channel

Blue channel

Step 2: Crop base color-channel

I chose green to be my base color-channel because it was typically the center color-channel (blue on top, followed by green, then red), so the shifts would not need to be as much. I cropped the green channel by 15% all around. This crop was useful for 2 main reasons:

The edges of the image contain more garage pixels (ex. border) and may throw off calculations for estimating alignment with other images
Cropping by 15% all-around gives me 30% of wiggle room to find a good alignment between this base channel and other color channels. I can shift the cropped base color-channel (about 2/3 the size of the non-cropped color channel) around the other color-channels to detect where there is an ideal alignment

Cropped green base color-channel:

Step 3: Align remaining color-channels with base color-channel

Approach 1: Naive Offset Alignment (effective for smaller JPEG images)

From my cropped base green-channel, I now seeked to find the appropiate blue and red offsets to match best with the base channel. In order to do this, for small images, I iterated through the various possibilities of x values (from 0 to the non-cropped width - cropped width) and of y values (from 0 to the non-cropped height - cropped height). At each of these (x,y) combos, I appropiately cropped the non-base channel and compared the alignment with the cropped base green-channel. I used Mean Squared Error to compare these alignments:

The smaller the MSE, the better the alignment. In this naive approach, I iterated through every (x,y) possibility to find the best alignment, the lowest MSE. This worked well for smaller JPEG images like the ones below:

Monastery

Cathedral

Tobolsk

However, for larger images like lady.tif, this approach would not take a reasonable amount of time to complete as running expontential offsets did not bode well for performance. For this performance issue, approach 2 was particularly useful in alleviating the intensive compution from this approach.

Approach 2: Pyramid Image Processing (effective for all, even larger TIFF images)

The issue for the naive approach is we were iterating through every possibility of (x,y) which is computationally expensive; however, using the image pyramid processing, we can restrict the possibilities of x and y we are exploring to limit the number of computations we do. An image pyramid is a collection of a single image at different resolutions. For example, for an image pyramid with a scale factor of 4, the resolution of lady.tif would be scaled down by 4 at each level AKA the number of pixels used to describe the same image at each level is reduced by a factor of 4.

At level 0

At level 1

At level 2

At level 3

With this pyramid, our base case is at the top level (level 3 in the example above). In our base case, we can compute what the best offsets are for a low resolution image using our naive approach as it would require iterating over a small list of possibilities as our resolution is low. From the base case, we can then restrict the possibilities we look at as we move down the pyramid indexing on the best offset recieved from the level above. This approach dramatically improved the performance of determining color channel colors for large TIFF images.

Step 4: Stitch color-channels together based on new alignment

Finally, we can overlay the various color channels now that we have figured out the best alignment from the previous steps. Stitching the various color channels together will result in the colorful version of the image: