Jimmy Xu
Sergei Mikhailovich Prokudin-Gorskii (1863-1944) [Сергей Михайлович Прокудин-Горский] was a man well ahead of his time. Convinced, as early as 1907, that color photography was the wave of the future, he won Tzar's special permission to travel across the vast Russian Empire and take color photographs of everything he saw including the only color portrait of Leo Tolstoy. And he really photographed everything: people, buildings, landscapes, railroads, bridges... thousands of color pictures! His idea was simple: record three exposures of every scene onto a glass plate using a red, a green, and a blue filter.
Never mind that there was no way to print color photographs until much later -- he envisioned special projectors to be installed in "multimedia" classrooms all across Russia where the children would be able to learn about their vast country. Alas, his plans never materialized: he left Russia in 1918, right after the revolution, never to return again. Luckily, his RGB glass plate negatives, capturing the last years of the Russian Empire, survived and were purchased in 1948 by the Library of Congress.
In this project, I attempt to colorize some pictures of the collection and will describe my approach below.
Keywords: SSD, NCC, image pyramid search, and auto whitebalance.
The goal of this project is to convert a glass plate image to a color image. Each of the original pictures is the three color channels of the same object that are vertically stacked together.
The naive approach is to divide a picture into three images of the same height and stack them in the color channel. However, this approach won't work because the three images don't align with each other.
The first challenge is then to find an appropriate displacement for each channel so that they will align to each other. An additional challenge is to do this efficiently so that we can process pictures of very large size.
Here's a preview of an aligned image:
My approach to align the pictures is as follows:
Select a base channel.
If the resolution of the channel is too large, it will be downscaled by a factor of 2. It will be recursively downscaled until it reaches the limit or the height or width is less than a range.
For each of the other two channels, compute the displacement to minimize the metric between the channel and the base channel
The offset from the channel of the lower resolution will be scaled back as the starting point of the rolling. Repeat 3.
The metric is used to calculate the similarity or dissimilarity between two pictures. I tried the sum of squared differences (SSD) and normalized cross-correlation (NCC). After applying both metrics to some pictures, it turns out SSD works better.
It turns out the base channel may have some significant impact on the final image. For example, the picture below uses blue as the base channel, which doesn't yield a good result. By simply changing the base channel to green or red, the result becomes much better. Through experimentation, I decide to use green as the base channel.
Instead of comparing the entire channel, I use the cropped center to calculate the metric, because the border of the channels are usually damaged or missing. This also makes computing faster. The picture below shows the result of doing alignment without center crop. Note the border of the riverbank. It is more misaligned.
If the input image is very large, it is prohibitively expensive to exhausively search for the best displacement pixel by pixel. I implement pyramid search (see above for implementation details) to search displacement from coarsest-scaled image to finest-scaled image, which greatly speeds up the process.
For grading purposes, here's the list of aligned images
Since the border of each aligned image is very regular, I decide to center crop a certain percentage of the picture.
I used the method mentioned in the python tutorial.
This usually leads to a more accurate, sometimes less appealing, representation of color.
I downloaded some additional glass plate images of Prokudin-Gorskii from the Library of Congress website. I then colorized and post-processed them.
For grading purposes, here's the list of offsets for each of the images
Picture | Blue to Green | Red to Green |
---|---|---|
cathedral.jpg | (-5,-2) | (7,1) |
church.tif | (-25,-4) | (33,-8) |
emir.tif | (-49,-24) | (57,17) |
harvesters.tif | (-59,-17) | (65,-3) |
icon.tif | (-41,-17) | (48,5) |
lady.tif | (-55,-9) | (62,4) |
melons.tif | (-82,-11) | (96,3) |
monastery.jpg | (3,-2) | (6,1) |
onion_church.tif | (-51,-27) | (57,10) |
self_portrait.tif | (-78,-29) | (98,8) |
three_generations.tif | (-52,-14) | (59,-3) |
tobolsk.jpg | (-3,-3) | (4,1) |
train.tif | (-43,-6) | (43,26) |
workshop.tif | (-53,1) | (52,-11) |
For grading purposes, here's the list of extra features I did for this project