Introduction

Sergey Mikhaylovich Prokudin-Gorsky was a Russian photographer who pioneered color photography. Before color printing or film processes existed he used the three-color principle that he envisioned could lead to the creation of color images in the future. This technique was originally suggested by James Clerk Maxwell, but the necessary photographic technology did not exist until later. The three-color principle essentially entails taking three exposures of the same image, each taken through a filter of either red, green or blue. These three negatives could then later be appropriately combined to form a color image, even though the three exposures themselves were shot in black and white.

Armed with this technique and equipment, Prokudin-Gorsky travelled through the Russian Empire under the approval of the Tsar from 1909 to 1915 to photograph Russian life. After his death hundreds of these negatives were acquired by the Library of Congress and made publicly available. In this project, I will examine the original negatives taken by Prokudin-Gorsky and carry out his vision to recreate color images. The main two steps in this process will be image segmentation and alignment of the color channels.

First I will employ a naive approach for aligment which is computationally more expensive, but works on small images. Next the algorithm will be improved using an image pyramid to allow for recreation of higher resolution scans. Next I will discuss a few further optimizations which can be used to further increase the quality of the results. Finally I will present a gallery of the images which I have recreated, including the before shots, unaligned channels and final results.

Naive Algorithm

When starting, we worked with small input files and applied the following process for image segmentation and alignment. For image segmentation, the image inputs contain all three channels stacked, so we simply separate the image into thirds. Each third corresponds to a different channel--from top to bottom the order is blue, green, red. Once these images are separated we can begin to combine them. Shown below is an original input image (with the three stacked images) and the three channels added together without alignment. We can notice obvious separation of the channels and an overall poor output image.

Three channels separated

Unaligned channels

In the naive solution we will exhaustively search over all potential translations of the green and the red channels such that we can align them with the blue channel. For the small images we found a range of shifts which differed by 15 pixels in both directions in both dimensions produced fairly good results. This naive algorithm considers each such shift and uses a score function to determine which shift is the "best". I implemented both a sum of squared differences and a normalized cross corrleation. These both yielded similar results, so I simply chose to continue with the sum of squared differences function which was slightly faster to compute. After alignement the images can be improved to the following.

Three channels separated

Unaligned channels

Aligned channels

There were two other small changes that I added to the naive algorithm--normalization and edge removal. Through testing, I found that the naive algorithm explained above did not always produce good results. I think this was in part due to differnces in values across channles and in par due to lack of salient details along the edges. By normalizing each of the channels (subtracting the mean and dividing by the standard deviation) and throwing away the outer 10% of the image on each side I was able to receive much crisper alignment.

Image Pyramid

The above approach works well on fairly small images around 400 pixels by 400 pixels or smaller. However many of the scanned images are on the order of 3000 pixels by 3000 pixels. Before we were searching through a range of shifts whose height and width were roughly 7.5% of the image (\(15 px -15 px = \) \(30 px \implies\) \( 30 px / 400 px =\) \( 7.5\%\)). This already considers \(31^2 = 961\) possible shifts. If we wanted to cover a similar number of shifts in a 3000 pixel by 3000 pixel image, that would mean that we would have to consider over 50000 shifts (\(3000 px \cdot 7.5\% = \) \(225 \implies\) \( 226^2 =\) \( 51076\)).

This search space is far too large to employ exhaustive search so instead we use an image pyramid. This is effectively the same process but we hopefully will prune many of the "bad" shifts which should make the process much faster. I start by examining the image at \(\frac{1}{16}\)th scale and searching over a range of shifts up to 16 pixels in any direction. After the best shift is found, we continue onto an image of twice the resolution, but with half the window size. We also center this new window about the previous shift that was found to be the best at the previous scale. We continue this process until we search for the best shift in the full resolution image. Our new and improved algorithm benefits from the fact that it is still relatively fast to compute large searches on small images and that these results roughly translate to higher resolutions. Full results can be seen in the gallery below.

Bells and Whistles

Edge Detection Alignment

For a few of the images, the alignment above using the above technique did not perfectly align things. In order to create better results, I used a Sobel filter to preprocess the image. This filter acts as an edge detector, so then we would be aligning the edges themselves. For Emir, this was what finally allowed me to get a decent result. Below is Emir aligned with and without edge detection / the Sobel filter.