In 1907, the chemist and photographer Sergei Mikhailovich
Prokudin-Gorskii, travelled around the Russian Empire, taking photos of
various landmarks and people he visited using a unique camera apparatus
(Figure 1). The camera was actually 3 cameras stacked on top of each
other, with a color filter. From top to bottom, blue, green, and red
filters. Therefore, each exposure, done on a glass plate, was actually a
stack of 3 exposures, one with a blue filter, the other a green filter,
and the final, a red filter. In this manner, stacking the 3 individual
color channel exposures with proper alignment would output a resultant
colorized image of the scene photographed (Figure 2).
Unfortunately,
Prokudin-Gorskii did not have the technology and means to be able to
accurately do this. But now, a century later, technology has more than
aptly caught up, and Prokudin-Gorskii’s glass-plate negatives have been
preserved and digitized by the Library of Congress. This project aims at
developing automated and fast algorithms to reconstruct the colorized
images from the digitized copies of Prokudin-Gorskii original
glass-plate negatives, in the most accurate and photorealistic manner
possible.
Figure 1: Camera used by Sergei Mikhailovich Prokudin-Gorskii to create the glass-plate negatives. Source: Libary of Congress |
Figure 2: Example glass-plate negative, and reconstruction of colorized RGB image with blue, green, red filters applied. Source: https://sechtl-vosecek.ucw.cz/en/expozice5.html |
The most naïve approach was to use exhaustive approach. Here we try
and shift two of the color channels (ch2 and ch3) to match the reference
channel (ch1). For each channel to be aligned, we shift it in the x and
y directions by any integer value between [-15, 15], and use a
similarity metric (explained later) to determine which of the shifted
channels is most similar to the reference channel based on the metric. We
then output the corresponding x and y value as the offset for that
specific color channel. By running on both channels, aligning the color
channels to the reference channel, and stacking the images on top of
each other, we get the corresponding colorized image for the glass-plate
negative using the exhaustive search method.
For the similarity
metric, both the Sum of Squared Differences (SSD) and Normalized
Cross-Correlation (NCC), were used. SSD metric, or l2 norm, is a
minimization of the Frobenius norm of the shifted image minus the
reference image. Whereas NCC is the maximization of the dot product of
the normalized vector of the shifted image with the normalized vector of
the reference image. Although both metrics yielded identical offset
values from smaller images, NCC ended up being the metric used for the
larger images as it’s offsets yielded slightly less artifacts and
reduced blurring (Figure 3 & 4).
Figure 3: Church image aligned
using SSD as similarity metric. |
Figure 4: Church image aligned
using NCC as similarity metric |
Note that by aligning, we simply mean circularly shifting the pixels of the color channel (a greyscale image matrix representing the brightness values of the filter color for each pixel of the image). In this manner, the border pixels would greatly affect the value of the similarity metric once the image is shifted. As a result, before the similarly metric is calculated for the shifted images, the images are cropped to only include the interior 70% of the image. 15% by width and height of the image is cropped out on either side.
Reference Color ChannelIt was quickly noted that the color channel used for reference to align the other two channels made a significant difference on the photorealism and level of artifacts in the alignment of the images. By default, the blue channel was used, but upon testing, it was noted that the green channel produced the best results. This was shown no more clearly than with the Emir photo (Figures 5 & 6). As mentioned in lecture, this may be attributed to how our eyes are most sensitive to green wavelengths than red or blue (Figure 7), or that green wavelengths are in between the red and blue wavelengths of visible light, therefore, green is spectrally close with both red and blue. As a result, aligning images to the green reference image produces the least artifacts and more photo-realism. |
Figure 7: The retina cone sensitivity
curves for blue, green, and red wavelengths. |
Figure 5: Image of Emir aligned using blue color channel as reference. |
Figure 6: Image of Emir aligned
using green color channel as reference. |
All images below used green as reference channel and NCC for similarity metric
Blue Channel Offset: (-2, -5)
|
Blue Channel Offset: (-2, 3)
|
Blue Channel Offset: (-3, -3)
|
Exhaustive search worked well for the smaller .jpg digitized images,
but on larger .tif images, which have resolutions on the order of
103, this naïve search method is incredibly slow. To reduce
runtime of the search method, the image pyramid technique was
implemented. This technique constructs a pyramid of images where at each
increasing level, the image is downsampled by a factor (default used was
0.5), until the image has a height less than 100 pixels. Note that the
sk.transform.rescale function is employed for downsampling to account
for anti-aliasing.
At this highest level, the image is small enough that
exhaustive search can be used with decent runtime. Then as you
progress down the image pyramid, we can run the exhaustive search
algorithm on just a localized portion of that higher resolution image
by using the offsets learned from the previous downsampled level. Note
that first we multiply these offsets by the inverse of the
downsampling factor (default: 1/0.5 = 2) to account for
downsampling.
In this manner, by first searching the downsampled images we
can localize the search on higher resolution images, in order to
reduce the total number of searches required to still obtain the ideal
offset values for the original high-resolution image, when compared to
the exhaustive search method.
Consequently, the window size to search at each level was
decreased to [-5, 5] for both x and y. Despite the reduced window size,
because of the image pyramid, the offsets at higher levels are magnified
at lower levels by 2n (where n is the level on the image
pyramid), so the total range of offsets searched by the image pyramid
search algorithm still exceeds the range in the previous naïve
exhaustive search method.
All images below used green as reference channel, NCC for similarity metric, 70% crop, and a downsampling factor of 0.5 (for large images).
Blue Channel Offset: (-4, -25)
|
Blue Channel Offset: (-24, -48)
|
Blue Channel Offset: (-17, -58)
|
Blue Channel Offset: (-17, -39)
|
Blue Channel Offset: (-8, -52)
|
Blue Channel Offset: (-9, -80)
|
Blue Channel Offset: (-27, -50)
|
Blue Channel Offset: (-28, -77)
|
Blue Channel Offset: (-13, -49)
|
Blue Channel Offset: (-5, -41)
|
Blue Channel Offset: (1, -52)
|
Blue Channel Offset: (-3, -3)
|
Blue Channel Offset: (-2, -5)
|
Blue Channel Offset: (-2, 3)
|
All images below used green as reference channel, NCC for similarity metric, 70% crop, and a downsampling factor of 0.5.
Church Entrance Blue Channel Offset: (-20, -64)
|
Cathedral Red Blue Channel Offset: (-21, 100)
|
Cathedral White Blue Channel Offset: (-27, -56)
|
Floodgate Supervisor - colorshifts in water Blue Channel Offset: (22, -48)
|
Man in Red Clothes - bad white balance Blue Channel Offset: (-19, -67)
|
Cotton Mill Blue Channel Offset: (-28, -61)
|
Excavator Blue Channel Offset: (-17, -34)
|
Firemen Squad Blue Channel Offset: (-34, -3)
|
Pink Flowers Blue Channel Offset: (6, -48)
|
Mosque Blue Channel Offset: (3, -55)
|
Ocean Sunset - looks like a polaroid Blue Channel Offset: (-12, -52)
|
Windmill Blue Channel Offset: (-15, -63)
|