Project 1: Images of the Russian Empire - Coloring the Prokudin-Gorskii Photo Collection

CS 294-26: Introduction to Computer Vision & Computational Photography, Fa21

Jaeyun Stella Seo, CS194-26-abt



Overview

In this project, we colorized images from the early 1900s, bringing to fruition the dreams of Sergei Prokudin-Gorskii. Prokudin-Gorskii photographed various scenes with a unique approach: once with a red filter, once with a green filter, and once with a blue filter. Though he only had a black and white image for each of these captures, we are now able to stitch together these trios of images to produce a full color image of the scene. However, to do so, we must first align the three images, since they are slightly offset. We must then colorize each channel appropriately before stacking to produce the whole colorized image.

Part 1: JPG Images

JPG Approach

The images are provided by the Library of Congress as three stacked images: blue, green, then red channels. They are not perfectly aligned, and they have obvious differences in intensity in each channel.

Figure 1: The cathedral is captured three times in this scan. The top channel is blue, then green, then red. Note that there are slight offsets between each image, as there were probably some jostling and adjustments as the filters were swapped out between takes.

Because jpg images are fairly small, we can run a pretty intuitive approach to the scans. We want to align the images while they are in greyscale. To do this, we split the image into three different ones: blue, green, and red.

Figure 2A: Cathedral blue channel
Figure 2B: Cathedral green channel
Figure 2C: Cathedral red channel

To align the images, we choose one to be a stationary template. We can then choose another one of the images and start shifting it around until it lines up well. We know that the images are not terribly poorly aligned, so we do not have to shift it very much. For our purposes, we say that we can shift the image in the x and y directions -8 to 8 pixels. This is to say it can move 8 pixels up, down, to the left, and to the right. For each of these alignments, we check how well the image lines up.

But what is the measurement to say how "well" something lines up? We use the SSD, or sum of squared differences. At each alignment, we take the difference between each pixel, square it (so they're all positive), and then sum across every pixel. In other words, we apply sum(sum((image1-image2).^2)). It's important to note that boundaries are rather iffy, since they are included in our rgb channels but are not meaningful in alignment. Thus we crop out 1/16th of the image on all sides; we are only interested in the real meat of the image at its center.

Figure 3: (This image is taken from the CS 194-26 lecture notes from Fa21) We apply the SSD formula, where (u,v) are the offsets in the x, y directions; (x,y) are the pixel coordinates, P is our template image, and I is the image being shifted around. For my approach, I always chose P to be the red channel so all the images are aligned to the red as a reference. We choose N to be the neighborhood of search. Since we know that the images are not terribly misaligned, we choose this to be [-8, 8] in our approach.

We calculate this SSD across all possible alignments allowed. We choose the minimum SSD and choose the corresponding alignment to say that it is the "best aligned". This is to say that the brightness in all channels is the most similar at this alignment. While this may not necessarily be true based on the colors present in the image, it is a good base metric since all pixels in the scene are receiving similar amounts of lighting in similar conditions. As you can see below, the results are rather good.

JPG Results

Course Examples

Note in all images, red is used as the reference, so the blue and green offsets are provided. Offsets are provided as (n, m) where n is the offset in the x and m is the offset in the y.

(I apologize to course staff, as I realized too late that the project spec requests us to use blue as the template.)

Figure 4A: Cathedral, colorized. Green offset: (-1, -7). Blue offset: (-6, -7).
Figure 4B: Monastery, colorized. Green offset: (-1, -6). Blue offset: (-2, -3).
Figure 4C: Tobolsk, colorized. Green offset: (-1, -4). Blue offset: (-3, -6).

Extra Examples

Below are some additional examples. Our colorization is on the left, and there are references from the LoC on the right that are more expertly restored with better adjustments. However, you will note, especially for Little Russia, that our basic approach works reasonably well anyway.



Figure 5A: Denmark Estate, colored. Green offset: (2, -5). Blue offset: (5, -7).
Figure 5B: Denmark Estate, colored reference from LoC.
Figure 6A: Conservatory, colored. Green offset: (0, -4). Blue offset: (-2, -7).
Figure 6A: Conservatory, colored reference from LoC.
Figure 7A: Little Russia, colored. Green offset: (0, 2). Blue offset: (-1, 4).
Figure 7B: Little Russia, colored reference from LoC.

Part 2: TIF Images

2.1: TIF Approach

Our JPG approach only works in a reasonable amount of time because JPG images are compressed and reasonably small. However, for large image formats, like TIF, we cannot possibly exhaust every possible alignment pixel-wise. Because there are so many pixels in a high resolution scan, it's also hard to say what a reasonable neighborhood of search is. Consequently, we must apply a different approach. Thankfully it's not too different from what we've already done!

In order to recycle some of what we've done, we create a "pyramid" of images. We take the original image and downscale it by factors of 2 repeatedly. At each step the image becomes coarser--which is to say, lower resolution. We build the image pyramid and then start at the top of it with the coarsest image. We can then run the JPG approach since the image is small. Once we've aligned this coarse image, we have a better idea of where to look in the next image down the pyramid: twice the offsets that minimize the SSD. (We need a factor of 2 times since the images are scaled by a factor of 2.) So instead of searching exhaustively, we have a smart "starting point" to examine the neighborhood around it. Basically, we take "guesses" from coarser images and then update our guesses at higher resolutions. The whole process takes roughly 1 minute per TIF image. For an old machine like mine, that's pretty good!

You can see below the progression of "guesses" that gets sharper (more aligned) with each iteration.


Figure 8A: Self Portrait, guess 0 (unaligned).
Figure 8B: Self Portrait, guess 1
Figure 8C: Self Portrait, guess 2.
Figure 8D: Self Portrait, guess 3.
Figure 8E: Self Portrait, guess 4.
Figure 8F: Self Portraig, guess 5 (final).

2.1 TIF Results

Course Examples




Figure 9: Church, colorized. Green offset: (-8, 33). Blue offset: (-4, 59). Runtime: 63.06s.
Figure 10: Emir, colorized.Green offset: (17, 57). Blue offset: (55, 103). Runtime: 59.75s.
Figure 11: Harvesters, colorized. Green offset: (03, 65). Blue offset: (13, 124). Runtime: 67.66s.
Figure 12: Icon, colorized. Green offset: (5, 48). Blue offset: (23, 90). Runtime: 60.16s.
Figure 13: Lady, colorized. Green offset: (4, 61). Blue offset: (11, 115). Runtime: 63.94s.
Figure 14: Melons, colorized. Green offset: (4, 96). Blue offset: (13, 179). Runtime: 61.68s.
Figure 15: Onion Church, colorized. Green offset: (10, 57). Blue offset: (36, 108). Runtime: 68.19s.
Figure 16: Self Portrait, colorized. Green offset: (8, 98). Blue offset: (36, 176). Runtime: 61.95s.
Figure 17: Three Generations, colorized. Green offset: (-3, 58). Blue offset: (11, 112). Runtime: 64.47s.
Figure 18: Train, colorized. Green offset: (27, 43). Blue offset: (32, 87). Runtime: 60.57s.
Figure 19: Workshop, colorized. Green offset: (-11, 52). Blue offset: (-12, 103). Runtime: 62.23s.

Note in Figure 10, Emir, that the green channel is just slightly offset, though it is not apparent at a quick glance. This is because the brightness on the channels are different, so it is hard to match with our naive SSD. That being said, it does pretty well overall.

Extra Examples


Figure 20A: Boatyard, colored. Green offset: (17, 102). Blue offset: (61, 126). Runtime: 61.11s.
Figure 20B: Boatyard, colored reference from LoC.
Figure 21A: Wooden Church, colored. Green offset: (20, 26). Blue offset: (41, 36). Runtime: 62.25s.
Figure 21B: Wooden Church, colored reference from LoC.