Background

Between 1905 and 1915, Russian photographer Sergey Prokudin-Gorskii traveled the Russian Empire taking thousands of pictures across a wide variety of subjects, from religious monuments to farms. This was before color photography was widely used, but Sergey was ahead of his time. He took 3 photos of every subject with a red, green and blue filter, which he knew would allow people to eventually recreate the photos in color. However, because the photos aren't identical, we have to use some computational techniques to reconstruct the images.

The original red, green and blue plates were acquired by the library of congress in 1948 and they had the same problem as described above. If you just stack the channels vertically, this is the result:

Below I will talk about some of the ways to make the images align better.

Aligning the Images

The naive approach is to exhaustively test every alignment of the channels and see which works best and use that alignment. To do this I cropped the images and used the middle to loop through all possible channel combinations in the range of 15 pixels. The trick is defining what the 'best' is in this context. Two common loss functions are L1 and L2 loss defined below.

$$ L1(img_{1}, img_{2}) = \sum_i\sum_{j} { abs(img1_{i,j} - img2_{i,j}})$$ $$ L2(img_1, img_2) = \sum_i\sum_{j} { (img_{i,j} - img_{i,j}})^2$$

There was also another loss function that I experimented with for this project called normalized cross correlation. I think this metric actually gives the best images of the three loss metrics, but it's much slower than the other two so I ended up using it less.

$$NCC(img_1, img_2) = \frac{\sum_i\sum_{j} { img1_{i,j} \times img2_{i,j}}}{(\sum_i\sum_{j}img1_{i,j}^2 \sum_i\sum_{j}img2_{i,j}^2)^{0.5}} $$

Using this method, I aligned all the small images.

It looks pretty good but there's definitely room for improvement (white balance and cropping). The main problem is that this method takes roughly forever on larger images. That's where the pyramid technique comes in.

The Pyramid Technique

The problem of doing computationally expensive procedures on large images is often solved with the pyramid technique in which the image is shrunk (usually through a subsample or blur) and then operations are performed on the smaller image and the modifications are propagated through. For example, if you shrink an image by a factor of two, five times, then move the small image three pixels to the right, those three pixels are multiple by two for every down size. Those three pixels on the small image end up being 96 pixels on the final image but much much faster. As you move to images closer to the full size, the shifting is more fine grained. TLDR: Do the same thing on a small image and multiply.

I eventually maneged to implement this 'simple' algorithm after roughly 30 hours and it mostly worked. Here is a massive image of a massive man all lined up.

But again we got the same problem. White balance and cropping and other photography things I don't understand. I will attempt to fix at least some of that in the next section

Extras

The first thing I did was make an auto white balance function using the grey world assumption. This just means that the average of all photos should be grey. This is a bad assumption. It is kinda the best available assumption though unfortunately. All I really did was apply the following function.

$$image = image / mean(image) \cdot \frac{1}{2}$$

And then normalized to make sure everything was in range. You can see the before and after below.

As you can see... much more grey. You can play with the hyper parameters to get a less depressing image but it was Russia in 1910 so I think this is the most accurate vibe.

One more small improvement was auto contrasting. I did this by taking the aligned image, splitting the three color channels, and then normalizing so that the smallest value for each channel was 0 and largest was 1. Pretty simple but the results were decent.

Conclusion

All in all , most pictures could be aligned using one technique or another with a reasonable degree of success. Future improvements include automatic cropping to get rid of the weird multicolored edges and better while balance. The rest of the images are below. Enjoy.

cathedral.jpg
B offset: (0, -7)
G Offset: (1, -7)
Runtime: 16.21 seconds