CS 194-26: Project 1

Images of the Russian Empire: Colorizing the Prokudin-Gorskii photo collection
_{Frederick Kim, CS194-26-ABO}

Overview

This project introduces basic image processing by colorizing the digital version of Prokudin-Gorskii’s glass plate images. Traveling across the Russian Empire between 1909 to 1915, Prokudin-Gorskii was able to capture color images by filtering red, green, and blue light and capturing an image for each color. Therefore, we are able to take the three exposure corresponding to each RGB color, find an overlay that has the closest match, and align the three color channels together to create a colorized photo. We implemented two different ways of aligning the different channels: single-scale and multi-scale.

Single-scale Implementation

The single-scale implementation is a simple way to align the different channels using a hueristic. First, one of the three channels is selected as the target while the rest are sample channels. Given a range of displacement values, the sample channels are moved in the $x$ and $y$ direction while using the heuristic metric to check which displacement matches closest to the target channel. In this project, I used the sum of squared differences as the heuristic metric. Following the formula $sum((image1 - image2) ^ 2)$ where the sum is taken over all the pixel values, we are able to use this heusristic to test which displacement aligns closest to the target channel. Once the best alignment is found for both sample channels, we are able to stack the three images with their respective colors to create a colorized image.

Psuedocode:

split the original image into evenly sized b, g, r channels
crop 20 pixels from each side for each channel

choose one channel to be the target while the other two are the sample

for each displacement of the sample over [-15, 15] over x and y:
    calculate the sum of squared differences (SSD) between the sample and target channels
    store displacement with the smallest SSD
    
stack the three channels with the sample channels displaced

tobolsk.jpg

g: (x=3, y=3) | r: (x=3, y=6)

cathedral.jpg

g: (x=2, y=5) | r: (x=3, y=12)

monastery.jpg

g: (x=2, y=-3) | r: (x=2, y=3)

Multi-scale Implementation

While the single-scale implementation works well with small images, large images such as the .tif examples make the implementation too slow. Therefore, a multi-scale approach was taken.

The multi-scale implementation utilizes the concept of image pyramids where the top level has the lowest resolution and the bottom level has the highest resolution. We use this by only doing an exhaustive search (same method used in single-scale) for the best displacement at the top level with the lowest resolution then using the result to help inform a nearby area for testing displacement on the next level. This allows the next level to test a smaller displacement range which speeds up the process for finding the best alignment.

Pseudocode:

split the original image into evenly sized b, g, r channels
crop 200 pixels from each side for each channel

choose one channel to be the target while the other two are the sample

func multiscale:
    if width and height of the sample <= 400:
        return displacement with best alignment using exhaustive search
    else:
        recursively find scaled displacement with best alignment from downscaled images
        find displacement with best alignment in current scale offset by previously found displacement
        return the combined displacement
    
stack the three channels with the sample channels displaced

Utilizing the image pyramid to reduce the range of displacements allowed for a significant increase in runtime as seen below:

melons.tif

g: (x=9, y=82) | r: (x=12, y=178)

time: 22.6s

onion_church.tif

g: (x=26, y=51) | r: (x=36, y=108)

time: 31.2s

icon.tif

g: (x=17, y=40) | r: (x=23, y=89)

time: 24.4s

lady.tif

g: (x=8, y=47) | r: (x=11, y=113)

time: 26.5s

harvesters.tif

g: (x=16, y=59) | r: (x=13, y=124)

time: 23.6s

church.tif

g: (x=4, y=25) | r: (x=-4, y=58)

time: 23.7s

self_portrait.tif

g: (x=28, y=78) | r: (x=36, y=176)

time: 26.4s

three_generations.tif

g: (x=14, y=53) | r: (x=10, y=111)

time: 22.7s

train.tif

g: (x=5, y=42) | r: (x=31, y=87)

time: 23.6s

workshop.tif

g: (x=0, y=52) | r: (x=-12, y=104)

time: 23.9s

Emir

One example that didn’t create a proper color image was the image of Emir of Bukhara (emir.tif). This is because the channels do not have the same brightness level which confused the heuristic when aligning to the blue channel as was done with all previous examples. A workaround to this issue was to align to the green channel instead of the blue channel, resulting in a proper colored image.