CS 194-26: Intro to Computer Vision and Computational Photography

Project 1: Images Of the Russian Empire

Galen Kimball


Overview

This project aimed to recreate a color image of scenes taken by Sergei Mikhailovich Prokudin-Gorskii, reconstructed from three separate (but simultaneous) photographs taken with red, green, and blue filters. Ideally, this would be done by simply layering the photographs over one another, but the nature of the machine used to take the photos meant that there are errors and offsets introduced from each photo having been taken in a slightly different spatial location. This project processes the red and green color channels to align them with the blue channel, where they are then layered together to form the final full-color image.

Approach

For small images, it suffices to attempt a brute-force method: offset each image some amount of pixels within a range (I used [-23, 23], inclusive), and for each potential offset, check how well aligned the images are. I used the Sum of Squared Differences (SSD) to compare each color channel to determine how well aligned they were. In python, np.roll was invaluable for shifting the red and green channels to match up with the blue.

Each channel's image, however, has borders and other undesirable artifacts around the edges. To prevent these artifacts from interfering with the SSD calculation, a preset boundary of 10% per side was cropped off of each image channel before the SSD was calculated.

However, for large images, this brute-force algorithm will not work. An red channel image of 3000 x 3000 pixels, even if known to be offset from the reference blue channel by only 5%, would require a search of offsets between [-150, 150] for each dimension - or 9*10^4 calculations of the SSD of two 9 million pixel images. This obviously would take more time than we'd like, so we employ an image pyramiding technique.

To do this, I first repeatedly downsample each image by half until less than 512 pixels wide. From here, I searched through a [-23, 23] pixel offset range for each row and column dimension using the same brute force method described above. This allowed me to get a good starting point for how much each dimension should be moved. For instance, if I downsampled three times (for a total of an 8x shrinking factor), an offset of (-3, 4) would correspond to an offset in the original image of (-24, 32). This starting point allows us to vastly speed up the next steps, as we now have a much smaller range of pixel offsets to search in. We can then move on to the next level, which would be a 4x shrinking factor, but only have to search in a [-1, 1] range for each dimension. A resulting offset of (-1, 0) would then bring our new total offset to (-24 + -4, 32 + 0) = (-28, 32). This offset search between [-1, 1] is repeated for the 2x and 1x (original) downsampled images as well, which will give us our final result. For safety, I modified this algorithm to search between a [-2, 2] range.

Issues

I ran into some issues with my image pyramiding algorithm - it seemed to be giving me some bad results when I downsampled images heavily (e.g. to under 256 pixels). I initially thought my algorithm was incorrect, but when playing around with parameters I realized my algorithm worked well when images weren't downsampled as much. I realized why when looking at individually output downsampled color channels (below):

Red channel of church.tiff, downsampled to less than 256 pixels
Green channel of church.tiff, downsampled to less than 256 pixels
Blue channel of church.tiff, downsampled to less than 256 pixels

Notice that the red and green channel have easily discernible features -- but it's difficult to see much detail in the downsampled blue channel. Since the values in the blue channel are so close to zero when downsampled and formerly sharp features have been blurred, the calculation of the SSD between a red/green channel and the blue channel becomes susceptible to noise, the amount of red/green inherent in the scene, and other things that weren't necessarily as impactful to the SSD calculation when the blue channel was larger sized and had more sharp features. This means that at the deepest levels of image pyramiding, we might find we minimize our SSD at a place that's several pixels off from the true optimum -- which, at a level of 5-time downsampling, for example, would get amplified by 32 times. The image below demonstrates how offset the red and blue image channels were.

Bad output for church.tiff alignment

Fixing this was simple -- I just had to reduce how much I was downsampling images to be only under 512 pixels, but above 256.

A cleaner solution that would let me reduce images to under 256 pixels wide would have been to artificially increase the contrast of each channel (perhaps with something as simple as bringing the lowest pixel value to zero, and the highest to 255), and calculate offsets based on the SSD of these automatically contrasted channels versus unprocessed ones.

Results

Aligned Provided Images

Below are all provided images, aligned to the blue channel.

Legend: Name - (Red Row Offset, Red Column Offset), (Green Row Offset, Green Column Offset)


Cathedral - (3, 12), (2, 5)
Church - (-4, 58), (4, 25)
Emir - (57, 103), (24, 49)
Harvesters - (13, 124), (16, 59)
Icon - (23, 90), (17, 41)
Lady - (11, 116), (8, 56)
Melons - (13, 178), (11, 82)
Monastery - (2, 3), (2, -3)
Onion Church - (36, 108), (27, 51)
Self Portrait - (36, 176), (29, 79)
Three Generations - (11, 112), (14, 53)
Tobolsk - (3, 6), (3, 3)
Train - (32, 87), (6, 42)
Workshop - (-12, 105), (0, 53)

Images Acquired Online From Prokudin-Gorskii Collection

Below are four images obtained from the Prokudin-Gorskii Collection online.

Doors - (40, 146), (20, 64)
Railroad - (5, 141), (9, 66)
Roses - (30, 156), (21, 75)
Trees - (7, 97), (-32, 36)