CS 194-26: Intro to Computer Vision and Computational Photography, Fall 2021

Project 1: Images of the Russian Empire --
Colorizing the Prokudin-Gorskii Photo Collection

Priyanka Kargupta, CS194-26-aas



Overview

The goal of this assignment is to take the digitized Prokudin-Gorskii glass plate images and, using image processing techniques, automatically produce a color image with as few visual artifacts as possible. In order to do this, I extracted the three color channel images, placed them on top of each other, and aligned them so that they form a single RGB color image. Two algorithms were used based on the resolution of the image: an exhaustive search was used for smaller, lower resolution .jpg images, which searched over a window of possible displacements, scored each one using the L2 norm, and took the displacement with the best score. On the other hand, for higher resolution .tiff images, as exhaustive search would be incredibly inefficient as higher resolutions require larger displacements, I utilized a pyramid search, which recursively ran exhaustive search on scaled down versions of the original image. In addition to these basic algorithms, I also implemented additional bells & whistles like automatic cropping in order to further enhance the quality of the alignment and thus the produced image.

Glass plate 3-channel image of Monastery

Section I: Exhaustive Search

For exhaustive search, I simply implemented a naive search over the window [-15, 15] for both the x and y dimension, utilizing np.roll to adjust the image according to the window. I also used SSD as my metric, as it seemed to work better than NCC on the images. For each window adjustment tuple, I computed the SSD between the chosen higher channel (e.g. R, B) and the base channel, B and chose the window adjustments based on which one minimized the SSD. After doing so, here are the results for the lower dimensional, .JPG images (includes the changing of base channel from blue to green and cropping):


.JPG Displacement Vectors:

Image Red Displacement Vector Blue Displacement Vector
monastery.jpg [6 1] [3 -2]
cathedral.jpg [7 0] [-5 -2]
tobolsk.jpg [4 0] [-3 -2]

Section II: Pyramid Search

Since for high-dimensional images, solely exhaustive search would be very inefficient as they require large translations/adjustments in the image, I implemented a pyramid search, which involves constructing a pyramid in which higher layers in the pyramid contain scaled down versions of the original image. I began my search from the coarsest, top layer, calculating the n to scale down the image by 2^n through a log2 operation on the green channel's width / 400, essentially only performing the scaling operation on dimensions greater than 400 pixels. Then, I performed an exhaustive search on the scaled down image, retrieving the displacement vector for that layer, and scaling it up by the same 2^n factor. This was so that I could apply the displacement vector to the original image and pass it on to the next largest layer (factor 2^(n-1)). I continued this process till I reached the bottom layer, which was the original image, and performed one final exhaustive search. In order to compute the final displacement vectors for the red and blue channels, I simply summed up all of the scaled displacement vectors from each of the previous layers, respective to the channel. Here are the results for the high-dimensional .tif images (includes the changing of base channel from blue to green and cropping):


.TIFF Displacement Vectors:

Image Red Displacement Vector Blue Displacement Vector
church.tif [33 -7] [-25 0]
emir.tif [57 17] [-49 -23]
harvesters.tif [65 -3] [-60 -16]
icon.tif [48 5] [-40 -17]
lady.tif [62 3] [-54 -8]
melons.tif [96 3] [-82 -9]
onion_church.tif [57 10] [-52 -25]
self_portrait.tif [98 8] [-78 -28]
three_generations.tif [58 -2] [-54 -12]
train.tif [43 26] [-42 0]
workshop.tif [53 -11] [-53 1]

Bells & Whistles: Auto Cropping

In all of the original images, a white and black border is present, which often causes an inaccurate value in SSD as most of the images have different sized borders with varying widths of white and black borders, so aligning based on them (especially with their extreme pixel values) are likely to skew our displacement vectors. Hence, I decide to implement an auto-cropping algorithm, which computed the row and column-wise averages of pixel values and if they fell within a specified threshold, accounting for both white and black border pixel averages as the bounds, then the mask value would return 1. I created two separate masks for the row and column wise averages, and then found the first and last indices that returned true for the mask, two each for the row and columns. Thus, corners of a bounding box were found for all three of the channels. Once I had three bounding boxes, I found the intersection of all three and used that bounding box as the bounds for all three channels, so that their dimensions were consistent. This significantly improved the quality of my alignments as well as the efficiency of my program in general. The following shows one of the higher-dimensional images before and after auto-cropping:



Change of Base Channel

I noticed that compared to our default choice of blue being our base channel for red and green to align to, green as the base channel worked quite a bit better, especially with emir.tif. The following shows one of the higher-dimensional images with blue and green as their base channels, respectively (with auto-cropping in the latter as well):


Additional Images:

Additional .TIFF Displacement Vectors:

Image Red Displacement Vector Blue Displacement Vector
bridge.tif [55 -18] [-68 11]
yurt.tif [73 15] [-58 -31]
waiting.tif [39 15] [-38 -21]