CS 194 Project 1
Architecture

CS 194 Project 1

Methodology

Overview

The purpose of this project was to process the Prokudin-Gorskii collection of images programmatically. These were images that were taken before color photography was invented. They were taken such that the light that entered the image was filtered by red, green, and blue to form three images detecting the various parts of the digital system. Unfortunately, the photographers involved in the project were unable to finish the project of combining the images together. However, recently, these images have been uploaded to the internet by the library of congress. This project will taking these images from its greyscale nature to a color photo.

Process

Images were given as three images stacked upon one another (see below):

I first divided the images into three parts vertically, and then attempted to match the channels together to create the best overlayed images. To do this, we first converted each image into a floating point representation. This allows us to perform computation to try and align the images spacially. To do this, I cyclically rotated or "rolled" the images in the horizontal and vertical direction, attempting to match up the middle portions so that the R, G, and B channels were well-correlated. However, I needed to avoid having a bias in the correlation, so we normalized each image before applying this correlation. This ensured that I would only be testing for correlation with the variance, and not any bias that came up with the image. This is described by the formulas below, where $I$ is the image. $$ I_{\text{mean}} = \frac{1}{\text{height} * \text{width}} \sum_{i = 0}^{\text{width}} \sum_{j = 0}^{\text{height}} I[i, j]$$ $$ I[x, y] = ( I[x, y] - I_{mean} )/ || I - I_{mean} || $$ I ended up searching over a 40 by 40 pixel square. I would move the images over by a given pixel and obtain the cross correlation between the two channels. We would crop out around a sixth of the image from each side to ensure we only measured the centerpiece of the image which is what mattered the most. The cross correlation was simply the dot product of the normalized images after the cropping was applied. Since the green filter is the most sensitive to human eyes, we aligned the blue and red filters on the green filter to minimize any data loss.

Image Pyramid Optimizations

Unfortunately, images can be quite large in size, so the above algorithm applied to a high resolution image may not work efficiently. Nor, will it work effectively. This is because a 40 by 40 grid is a relatively small size compared to the overall resolution of the image. Images in 4K, for instance would not work very well with this size. Efficiency is also a problem at this scale since you are now running 8 million computations for a single correlation check of a 4K image. Thus, we ran an optimization that works very well called an *image pyramid*. The optimization is relatively simple. We first scale down the image to around a 400 by 400 pixel value. We then apply the above algorithm for a search space of 40 by 40 square grid. Then, once this done, we take the outputted shift, and scale it up by the same factor the image was scaled down. We then repeat at a higher scale for a 5 by 5 grid. We can operate at a lower search space since the bottom of the pyramid did the majority of the work for us. As a result with each scale up, we have a smaller and smaller search space, making the computation significantly faster. It only takes around 20-30 seconds to process a single image.

Automatic Cropping (Belles and Whistles)

Once the images are aligned, we often get ugly artifcats near the borders as a result of our cyclic rotation.
To crop the image, we ran two algorithms to do so. The first was a modest, but perhaps oversized 10% crop on all corners. This had the effect of eliminating almost all artifacts, but it also often eliminated parts of the original image as well. This resulted in the following:
The second algorithm was an autocropping algorithm. To run this algorithm we took a maximum of 20% of the image on each horizontal side and computed the maximum L2 difference between consecutive horizontal pixels over all channels. For a given side, we then computed the average maximum difference between the pixel values for each row in the image. Once we had that number, we cropped the image by that much to get rid of most of the artifacts. We then did the same thing to the top and bottom sides of the image. As a result, we were able to generate the following image:
Notice how in the second image, there is more of the image on the width to be seen, as the image isn't cropped by a fixed percentage.

Image Gallery

We will now showcase all of the images for this project and their outcomes. We will showcase the aligend versions and with auto-crop. In the auto-crop case we will see some artifacting, but not a lot and we can vary it based on the percentage taken off. We will also include the final displacement for both the Red and Blue channels.

Name and Displacement (y, x) Result with No Cropping Result with Automatic Cropping
Name: Castle
Blue Shift: (-34, -2)
Red Shift: (65, 3)
Name: Cathedral
Blue Shift: (-5, -2)
Red Shift: (7, 1)
Name: Emir
Blue Shift: (-48, -24)
Red Shift: (58, 17)
Name: Harvesters
Blue Shift: (-59, -17)
Red Shift: (64, -3)
Name: Icon
Blue Shift : (49, 5)
Red Shift: (-41, -18)
Name: Lady
Blue Shift: (-53, -8)
Red Shift: (61, 4)
Name: Melons
Blue Shift: (-84, -9)
Red Shift: (96, 3)
Name: Monastery
Blue Shift: (3, -2)
Red Shift: (6, 1)
Name: Onion Church
Blue Shift: (-50, -27)
Red Shift: (58, 10)
Name: Self Portrait
Blue Shift: (-78, -29)
Red Shift: (98, 8)
Name: Three Generations
Blue Shift: (-51 -14)
Red Shift: (59 -3)
Name: Tobolsk
Blue Shift: (-3, -3)
Red Shift: (4, 1)
Name: Train
Blue Shift: (-43,)
Red Shift: (43, 26)
Name: Workshop
Blue Shift: (-53, 1)
Red Shift: (53, -11)

External Image Gallery

For the second part of this image gallery, we download original photos from the online collection found here

Name and Displacement (y, x) Result with Automatic Cropping
Name: Family
Blue Shift: (-57, -33)
Red Shift: (73, 16)
Name: Horse
Blue Shift: (-21, -27)
Red Shift: (89, 26)
Name: Big Bush
Blue Shift: (-31, -30)
Red Shift: (24, 16)