CS 194-26 Project 1

Colorizing the Prokudin-Gorskii Photo Collection

Regina Xu


Overview

An innovator of color photography, Sergei Mikhailovich Prokudin-Gorskii (1863-1944) amassed a collection of RGB glass plates that recorded various scenes from his travels through the Russian Empire. The purpose of this project is to produce color images with image processing techniques and alignment of RGB channels given digitalized glass plate images.


Approach

The process for alignment is to keep the B channel static, and align (R, B) then (G, B). Aligning involves iterating through a range of values for both x and y. For every x, y in this range, shift the non-blue channel in both axes and crop the borders since they can skew calculations, then calculate the sum of squared differences between the shifted channel and the B channel. SSD is used to determine if two channels match by flattening both channels and taking the sum of squared differences. The best displacement for alignment is the one resulting in minimum SSD. Lastly, the original R and G color channels are shifted with their respective displacement and stacked with the blue channel to create the color image.

Single-scale Implementation

For the .jpg images, R and G channels are rolled by x and y within the range [-15, 15]. In this naive implementation, all combinations of shifts are considered in a nested loop, and the non-blue channels are shifted by the displacement corresponding to the minimum SSD.

Multi-scale Implementation

With images larger than 400px in either axis, a 4-level image pyramid is constructed with the lowest resolution resized to 1/16 the original channel. The image pyramid is stored as a list of resized channels beginning with the lowest resolution and ending with the original channel. For each level, the same alignment procedure as the naive implementation is used except with different shifts: [-15, 15] for the first level and [-2, 2] for the remaining levels. After identifying the best displacement at the lowest resolution, each pixel is scaled by 2 in the next level so the updated displacement is 2*prev displacement + new displacement for both x and y axes. The final displacement is used to roll the original R or G channel before stacking all three channels to create a color image. Using an image pyramid significantly reduces the search space for the alignment procedure, cutting runtime from around 5 minutes to seconds.


Bells & Whistles

For emir.tif and icon.tif, the R channel alignment onto B failed due to the intense red brightness values on the robe and curtains, which skewed the SSD calculations. One workaround is to first align and shift the green channel image, then align R with the updated G channel (see middle column below). Another implementation is to use Canny edge detector to preprocess all three channels before alignment (see last column below).

Original

R(-181, 17), G(49, 22)

Align R with G

R(105, 37), G(49, 22)

Using edge detection

R(107, 40), G(49, 23)

R(-242, -8), G(42, 16)

R(90, 21), G(42, 16)

R(90, 22), G(41, 16)


Challenges

A challenge I encountered was determining how to update the displacement at each level of the image pyramid, and I started by tweaking the shift values before trying a smaller, constant shift range [-2, 2]. Additionally, I was calling the wrong library functions when implementing edge detection, trying cv2's GaussianBlur and Canny before successfully using this scikit example.


Results

Cathedral: R(12, -1), G(5, -1)

Monastery: R(3, 2), G(-3, 1)

Nativity: R(8, 0), G(3, 1)

Settlers: R(14, -1), G(7, 0)

Emir: R(105, 37), G(49, 22)

Harvesters: R(123, 10), G(60, 14)

Icon: R(90, 21), G(42, 16)

Village: R(137, -15), G(64, -7)

Lady: R(112, -14), G(56, -1)

Self Portrait: R(171, -2), G(77, 0)

Three Generations: R(112, 10), G(52, 9)

Train: R(84, 27), G(41, 0)

Additional images

Turkmen: R(113, 23), G(54, 15)

Children: R(141, 51), G(64, 31)

2 Prisoners: R(8, 2), G(4, 1)

Karlinskii: R(91, -14), G(23, -8)

With edge detection:
R(108, -56), G(54, -23)