CS194-26: Nikhil Patel

Overview

This purpose of this project was to colorize glass negatives from the Prokudin-Gorskii Collection. These negatives are split into three panes, each which had a different color filter (blue, green, and red) applied when the image was photographed.

Basic Approach

The most basic approach to colorize the images is to attempt to compute the offsets between the three channels and then shift the channels to align appropriately. We can brute-force this by computing some scoring metric (I chose sum of squared differences, but normalized cross-correlation produced equivalently good results) over a grid of possible pixel displacements. Because the image borders are not of the image and often contain just one color, we only want to consider internal pixels when computing our scoring metric. This has the added benefit of not needing to worry about np.roll rolling parts on the very top of the image to the bottom, etc.

This approach worked well for all of the smaller jpg images. However, to align the larger tif images, we needed a more sophisticated algorithmic.

Image Pyramid Alignment

It is harder to brute-force the scoring of all possible pixel displacements for larger images. Using a course-to-fine image pyramid, we are able to estimate an offset using a downscaled copy of the image, then repeatedly scale the image back up in order to compute finer and more accurate offsets. I chose an arbitrary size of 500px as the maximum downscaling ceiling -- i.e. any images smaller than 500px will not need to be downscaled to be aligned, and larger images will be downscaled until they reach some dimensions smaller than 500x500.

While this approach worked well for almost all of the images in the starter collection, emir.tif proved difficult to align because of the high amount of blue in the center of the image. This meant that the red-blue alignment was off. Using green as the reference fixed the misalignment of the red channel in emir.tif, producing a well-aligned colored image. It also continued to produce well-aligned images for the entire set of starter images as well as the additional ones I tested. A more sophisticated method could be developed in the future where the image channels could be compared to find the optimal reference channel.