CS 194-026 Project 1: "Colorizing the Prokudin-Gorskii photo collection"

Author: Joshua Fajardo

Project Overview

In the early 20th century, Sergei Mikhailovich Prokudin-Gorskii was able to create a large amount of some of the earliest color photographs in human history, thanks to a special device that would allow him to create three separate, differently filtered exposures of the same subject at the same time (https://inst.eecs.berkeley.edu/~cs194-26/fa21/hw/proj1/). This project aims to combine these sets of RGB exposures by determining the alignment offsets for each filtered image and creating color images from these aligned exposures.

Approach

Base Design

Excluding all of the bells and whistles, the design of this project is as follows:

For smaller, compressed images (.jpg files), an exhaustive approach is followed.

The scoring metric used to determine how "good" an alignment fits is the normalized cross-correlation (NCC) equation. The offsets for the red and blue channels are chosen to be aligned with the green channel. The red and blue images are displaced/rolled for each pair of vertical and horizontal displacements where each displacement value ranges from -15 to +15 pixels. This range was arbitrarily chosen based on a Piazza suggestion, and seems to work well on the images provided.

For larger images (.tif files), an exhaustive approach would be largely inefficient at finding the best alignment. Therefore, a pyramid search method was implemented. Each image is scaled down by a factor of 2, starting at 2^(-5), with the scaling factor doubling at each level until reaching a scaling factor of 1. At each level, the best displacement is found using the exhaustive approach on a slightly smaller window than in the exhaustive method, where the vertical and horizontal displacements are 10 by 10 windows, centered at the best displacement found at the previous scale (starting at (0, 0)).

Bells and Whistles

While the base design had some pretty good performance to begin with, I wanted to go a few steps further to get the best possible alignment.

The first enhancement that I added was cropping the images prior to calculating the NCCs. This allowed for a more fair comparison between different displacements, since only the common set of pixels (i.e. the ones that are not rolled over in any displacement) were considered for comparison. This enhancement led to a somewhat marginal (pun not intended) improvement in the alignment of the images.

The second enhancement that I added was a pre-processing stage before running the pyramid search. In this pre-processing stage, I have the program extract the edges from each RGB exposure and perform the same pyramid search based on those new stripped down images. The performance gain from this change was great; most notably, it allowed my algorithm to do a much better job at aligning the photo of the Emir of Bukhara.

The final images are purposely left uncropped, as I feel that it adds a bit to the story of how each image was made.

Problems and Solutions

One of the earlier problems that was encountered was that images with prominent patches where one or more color channels dominated the others (e.g. images with very blue skies or green greenery) tended not to align properly, often so as to increase the NCC by having some crossover across such patches as shown in the following images:

As mentioned in the Bells and Whistles section, by calculating the NCC based on stripped down images made using edge detectors instead of relying on the brightness of each RGB channel, I was able to get much cleaner images:

Despite all of the enhancements, there was still one image that appeared to have a good amount of room for improvement:

I suspect that the NCC may have been slightly thrown off by the strands in the lady's hair; there was likely a lot of variance across the edge detector images due to the hair's reddish hue, causing some brightness in the red channel thus making it more difficult to detect edges within the hair, and between the red hair and the white blouse.

Images

For the following images, the left is the result of the basic algorithm, and the right (if there) is the result of the upgraded algorithm. Below each image are the offsets used for the red and blue exposures, formatted: (vertical shift, horizontal shift)