Colorizing the Prokudin-Gorskii Photo Collection

Jingwei Kang, cs194-26-abr


I. Project Overview

The goal of the project was to automatically colorize the Prokudin-Gorskii Photo Collection. Each scene had three glass plate exposure taken with a different color filter (RGB). Because these images were not taken identically, it was necessary to displace the images before overlaying them to properly align them and colorize the photo.

II. Approach

There are numerous metrics for measuring the mismatch between the three filtered images, with simple ones including the Sum of Squared Differences (SSD) and Normalized Cross-Corelation (NCC). With my search parameters, I didn't notice a significant difference between using SSD and NCC (based on the displacements) so I stuck with SSD for the images.

For a lower resolution .jpg image, a brute force search by displacing an image was effective, but an image pyramid search procedure was required to reduce runtime for higher resolution .tif images. This entailed recursively rescaling the image to decrease the resolution (and number of pixels we would search over). At the base case, the brute force search was performed and the optimal parameters returned to the previous depth for additional searching.

Inherent to this procedure are a couple of hyperparameters. For example, by what factor should we scale down the image at each additional depth of the image pyramid? What range of displacement should we consider at the base case (both in terms of absolute pixels and fraction of the image)? How much should we crop the image by to minimize edge effects?

III. Results

Low resolution images

As shown below, the brute force offset search works well for the .jpg files.

File/Parameters SSD

cathedral.jpg
G[5, 2]
R[12, 3]

monastery.jpg
G[-3, 2]
R[3, 2]

tobolsk.jpg
G[3, 2]
R[6, 3]

High resolution images

The following images required an image pyramid search procedure to finish in reasonable time. I attempted to find parameters that worked well with the Prokudin-Gorskii images, but these don't necessarily generalize well to other data sets. For example, many of our images had borders or were weathered on the edges, and some had defining features right inside the borders, so I selected a parameter that allowed me to preserve as much useful information as possible. Cropping the image also helped to reduce computation and improve runtime. Another example includes the displacement at the base case. While all the .tif images were high resolution, some (e.g. lady.tif) appeared more hazy and blurred than others (e.g. onion_church.tif). The alignment of these images suffered when I allowed for a greater image pyramid depth (i.e. searched through a smaller space at the base case), perhaps prematurely falling into a local minimum. However, decreasing the image pyramid depth increases the runtime as a result, so a parameter that weighed both these factors was chosen.

This procedure worked reasonably well on most pictures. Some exceptions include emir.tif, which fails to align well with simple metrics because the glass plates were not taken with the same brightness. melons.tif potentially suffers from this same problem. Upon examination of the original image, I believe self_portrait.tif did not fare well because the image was focused on the man, who constituted a small part of the picture. The bulk of the image was (sometimes out of focus) shrubbery, which likely trapped the alignment into local minimums at decreased resolution layers of the image pyramid.

File/Parameters SSD

emir.tif
G[54, 34]
R[74], 50]

harvesters.tif
G[66, 22]
R[130, 11]

icon.tif
G[46, 22]
R[94, 30]

lady.tif
G[62, 14]
R[122, 3]

melons.tif
G[90, 18]
R[130, 2]

onion_church.tif
G[58, 34]
R[114, 42]

self_portrait.tif
G[86, 34]
R[130, 42]

three_generations.tif
G[58, 18]
R[118, 18]

train.tif
G[50, 10]
R[94, 38]

village.tif
G[70, 3]
R[130, 11]

workshop.tif
G[58, 4]
R[110, -21]

Additional images from the Prokudin-Gorskii Collection
File/Parameters SSD

raft.jpg
G[46, 8]
R[86, 2]

guardhouse.jpg
G[18, 22]
R[34, 34]

IV. Bells and Whistles

To further improve the colorization, I tested 1. better features, 2. automatic contrasting, and 3. automatic cropping. Because the brightness values did not match for emir.tif, metrics that are not affected by the raw color values were needed. Using Canny edge detection to generate outlines of the images, then aligning those outlines, proved to be effective. This improved the alignment in emir.tif and lady.tif as shown in the sharper edges. Surprisingly, this method did not improve the alignement in melons.tif. The results of the Canny edge detection in the skimage package showed that the edges around the melons were not captured, and I was unable to produce a clean outline despite modifying the parameters. With this Canny edge detection procedure, one could hypothetically identify and remove borders from these images. I opted for a simpler method where I cropped some amount of each edge plus the offsets I identified from the alignment. Lastly, I referenced Szeliski's textbook for improving contrast, and utilized the histogram equalization methods in skimage. This results in the drastically improved contrast in workshop.tif. Ultimately, this method did seem to make the photos "grayish" and "flat", but there are additional methods in the textbook and available in the skimage package.

Simple Bells and Whistles