Project 1. Colorizing the Prokudin-Gorskii Photo Collection

Michael Wan, SID: 3034012128


Project Overview

In the early 1900s, a Russian chemist and photographer by the name of Sergei Mikhaylovich Prokudin-Gorskii made pioneering efforts in color photography by using a novel three-image technique. For each scene, Prokudin-Gorskii recorded three pictures onto a glass plate - with a red, green, and blue filter. By overlaying each of these three pictures, the colorized photo could then be produced. This project uses his initial three-picture glass plates to reproduce the colorized photo.

        
Three-picture glass plate and the respective reconstructed color photo.


Problem Approach

The high level approach to this problem is to try various displacements to the channels to see which alignment is optimal. To do this, we fix one of the channels and try horizontal and vertical displacements on the other channels. Performance issues arise when the images are large in size, causing the displacement search to be quite computationaly exhaustive. As such, we implement a "pyramid search". This allows us to perform computationally inexhaustive displacement searches on scaled-down, lower-res images, and iteratively zooming in and adjusting our search window.

Pyramid Search Algorithm

We can first define the following values:

Variable / Constant Definition
$\text{im}_\text{fixed}$ The fixed image that we are attempting to align against.
im The image that we are attempting to align.
$s$ The amount that we scale up the image in each iteration of the pyramid search.
$k$ The number of iterations in the pyramid search.
$(l_x, l_y)$ At each iteration of the pyramid search, we scan across $l_x$ horizontal displacement values and $l_y$ vertical displacement values to check for the optimal alignment.
$(w_{x_i}^s, w_{x_i}^e)$ The starting and ending horizontal displacement values to search at the $i$th iteration of the pyramid search.
$(w_{y_i}^s, w_{y_i}^e)$ The starting and ending vertical displacement values to search at the $i$th iteration of the pyramid search.
$(d_{x_i}, d_{y_i})$ The optimal horizontal and vertical displacements found at the $i$th iteration. $(d_{x_{-1}}, d_{y_{-1}})$ are initiated to (0, 0).


We also have the following functions:

Function Definition
$NCC(\cdot)$ Calculates the normalized cross-correlation of two images.
$scale(\cdot)$ Scale an image by a given factor.
$displace(\cdot)$ Applies a horizontal and vertical displacement to an image.

Using these definitions, we can establish the following recurrence relation: $$(w_{x_{i}}^s, w_{x_{i}}^e) = \Big(sd_{x_{i-1}} - \left \lfloor{\frac{l_x}{2}}\right \rfloor, \ sd_{x_{i-1}} + \left \lfloor{\frac{l_x}{2}}\right \rfloor\Big)$$ $$(w_{y_{i}}^s, w_{y_{i}}^e) = \Big(sd_{y_{i-1}} - \left \lfloor{\frac{l_y}{2}}\right \rfloor, \ sd_{y_{i-1}} + \left \lfloor{\frac{l_y}{2}}\right \rfloor\Big)$$ $$d_{x_{i}} = \text{argmax}_{w_{x_{i}}^s \leq x \leq w_{x_{i}}^e} \text{NCC}(\text{im}_\text{fixed}, \text{displace}(\text{scale}(\text{im}, s^{k-i}), (x, 0)))$$ $$d_{y_{i}} = \text{argmax}_{w_{y_{i}}^s \leq y \leq w_{y_{i}}^e} \text{NCC}(\text{im}_\text{fixed}, \text{displace}(\text{scale}(\text{im}, s^{k-i}), (d_{x_{i}}, y)))$$

We can continue this iterative process until we find $(d_{x_{k-1}}, d_{y_{k-1}})$, which will give us the optimal displacement on the original image. Interestingly, fixing the green channel and finding the displacements on the blue and red channels seemed to work better than fixing the blue channel. This might be due to the fact that the green channel has the greatest variance, making it the best image to compare against.


Bells and Whistles

Upon implementing the naive pyramid search, the produced pictures were still slightly noisy and hazy, with the channels being slightly offset. It seemed like the algorithm could use better features for alignment. Specifically, aligning images based on the various edges in the image makes sense -- these edges exist across all three channels and are very strong signals regarding how to align the images. As such, I used skimage's Canny edge detector to preprocess the images, and aligned the plates using the Canny-filtered images.

Here are some examples of before filtering versus after filtering:

Before (without Canny filtering) After (with Canny filtering) Aligned Canny Images


Overall Image Results

Here are the colorized image results on all of the example images:
1. Cathedral
Blue offset = (-8, 0), Red offset = (7, -1)
2. Church
Blue offset = (-24, -4), Red offset = (31, -8)
3. Emir
Blue offset = (-46, -19), Red offset = (56, 17)
4. Harvesters
Blue offset = (-60, -18), Red offset = (64, -3)
5. Icon
Blue offset = (-41, -16), Red offset = (48, 5)
6. Lady
Blue offset = (-58, 0), Red offset = (63, 3)
7. Melons
Blue offset = (-86, -4), Red offset = (96, 4)
8. Onion Church
Blue offset = (-46, -36), Red offset = (61, 0)
9. Self Portrait
Blue offset = (-81, -31), Red offset = (96, 9)
10. Three Generations
Blue offset = (-57, -17), Red offset = (58, -6)
11. Train
Blue offset = (-40, 1), Red offset = (39, 26)
12. Workshop
Blue offset = (-56, 1), Red offset = (52, -10)
Here are the results ran on a couple examples of my own choosing:
1. Cheremukha
Blue offset = (26, -5), Red offset = (-12, 2)
2. Suna River
Blue offset = (-25, 6), Red offset = (77, -6)
3. Turkmenistan Mosque
Blue offset = (-33, -1), Red offset = (48, 2)