Colorizing The

Sergei Prokudin Gorskii

Photo Collection



dream and make the new reality with us

xr.berkeley.edu



Overview

Color is the difference giving rise to vision: arguably, the most sophisticated perceptual system that we, humans, are gifted with. Yet the vividness of our era is of recent origin; in the past, powerful color experiences were to be found only in nature. Mediated experiences, such as those codified in pictures, lacked the succinctness—the small yet significant, intensive difference—that animates reality. One must be a true visionary to see this, and capture the aesthetic of an era in a way that future generations can experience it profoundly. Sergei Prokudin-Gorskii was such a visionary. Armed with a camera and three color filters, he began, in 1907, a quest to photograph all of the Russian Empire. For about ten years he travelled across the continent, capturing myriads of scenes as threefold pictures. Thanks to him, we now have one of the most comprehensive ethnographic collections of the early 20th century. The joy I felt colorizing his photographs was as thrilling as discovering them for the first time.








Table of Contents

1. Metrics
2. Pyramids and Runtime
3. Alignment Tricks
4. Autocontrast
5. Gallery








1. Metrics


To align the three color channels, I implemented both the sum of squared differences (SSD), and the normalized cross-correlation (NCC) algorithms. As a general rule, the NCC performed better from a quality perspective, and had a negligible time tradeoff. The search offset I used was 15 pixels. It must also be mentioned that using np.roll() instead of for-loops had a tremendous impact on performance, providing over 200% speedup on channel alignment – from over 20 seconds needed to reconstruct an image to less than 1. Cheers to the friends who implemented those CPU vectors!

The figures below and in the next sections are accompanied by the shifts of their color channels, according to the key: color_channel(horizontal shift, vertical shift).




1.1. Cathedral

R(1,-8)
G(1,-1)


1.2. Monastery

R(-1,-10)
G(0,6)


1.3. Tobolsk

R(-3,-7)
G(-2,-3)








2. Pyramids and Runtime


To speed up the runtime for large inputs, I also implemented image pyramids. Namely, using a user-specified L (levels) and α (pyramid factor) variables, I iteratively (yes, iteratively, not recursively... !) construct a pyramid by rescaling a given channel by α until L is reached. The data structure I employed for the pyramid is a deque from the default collection, due to its simplicity in comparison to a list – after all, the pyramid only needs to support prepending.

Furthermore, it is worth noting that I rescale the channels at each level without antialiasing them, because after having experimented with both options, I could observe no practical difference. Antialiasing costs Ω(L) time that can simply be saved. I originally expected antialiasing to improve the quality of the output image, but I now hypothesize that upon advancing to the next pyramid level, any small error from rescaling alias is easily corrected, because it is within the search offset.

Overall, pyramids significantly improved speedup, with all images requiring less than 8 seconds to align using the default parameters. For timing tests, I used Python's time library. Of course, even though all images were aligned quickly, not all were also aligned correctly. It was about time to implement a few Bells & Whistles...




2.1. Castle

R(-2-110)
G(-3,-43)


2.2. Emir

R(-18,-114)
G(-2,2)


2.3. Harvesters

R(-2,-141)
G(-1,-141)


2.4. Icon

R(-21,-107)
G(-14,-43)


2.5. Lady

R(4,-141)
G(2,-58)


2.6. Melons

R(-2,-198)
G(-1,-91)


2.7. Onion Church

R(-36,-129)
G(-21,-64)


2.8. Self Portrait

R(1,-143)
G(1,-55)


2.9. Three Generations

R(-2,-129)
G(-3,-64)


2.10. Train

R(-1,-114)
G(2,-131)


2.11. Village

R(3,-134)
G(2,-167)


2.12. Workshop

R(4,-76)
G(-1,-64)








3. Alignment Tricks


Although pyramids guarantee that even the largest of images will be relatively well aligned in a reasonable amount of time, such alignment is far from perfect. If anything, the Monastery, the Emir, the Harvesters, the Village and the Workshop images are badly aligned, and so in a striking manner. To achieve better results I hence had to devise certain strategies, out of which four proved effective.

The first strategy was to simply increase the pyramid factor α, at the cost of time. Higher α means that the images stored in the pyramid will be larger, and therefore a larger number of pixels will be available to test for alignment. The more α approaches 1, the smaller the pyramid speedup. In practice, I found that 0.85 is a good compromise between quality and speed, with even the largest of images requiring less than 8 seconds to process when using 16-level pyramids.

The second strategy was to crop the images before performing any alignment tests, to remove noise present around borders. This drastically improved the output image, with its effects being most noticeable in smaller images, especially the Monastery.



Cathedral

10% Crop


Cathedral

Original Size


Monastery

10% Crop


Monastery

Original Size


Tobolsk

10% Crop


Tobolsk

Original Size

The third, and somewhat controversial strategy, was to align images based on the green, rather than the blue channel. In theory, using any channel as a base should yield similar results, unless the filter used to construct it is deffective. Perhaps Prokudin-Gorskii's green filter was at a better condition than his blue filter. Or perhaps, because photography took place under sunlight, the majority of which is green light, images passed by the green filter were graphed using more light. It appears more reasonable to alignt two low-exposure images with a high-exposure image, rather than a low-exposure and a high-exposure image with a low-exposure image. However, the above are merely hypotheses: I could not find any reliable sources on the topic. Yet for all practical purposes, aligning based on the green channel yielded better results; notice the stark difference between the green-based and blue-based versions of the Harvesters, and the cropped Emir.




Harvesters

Green Base, Original Size


Harvesters

Blue Base, Original Size


Emir

Green Base, 10% Crop


Emir

Blue Base, 10% Crop

The fourth, and final strategy, was to perform edge detection on the channels before attempting to align them. Since all three channels are low contrast images (the range of their shades is limited) and contain noise, pixels representing clearly distinct parts of the desired image may have relatively similar shades, which skews the metric. The solution is to ensure that any difference in color also encodes for a difference in characteristics, and that such difference is sharp. This can be achieved through edge detectors: filters that enhance the edges (the parts of an image giving rise to its characteristics) and that dim everything else. After experimenting with a number of them, including Sobel, SUSAN, and few hessian-based ones, the ones that stood out were Canny and Sato. Although Sato is primarily aimed at biomedical uses (to detect "tubeness") it works surprisingly well for the given set of images. The following figures compare Sato with Canny. The latter is marginally more effective.




3.1.a. Castle

Sato

B(2,28)
R(-1,-67)


3.1.b. Castle

Canny

B(2,31)
R(1,-67)


3.2.a. Emir

Sato

B(20,50)
R(-7,-58)


3.2.b. Emir

Canny

B(22,50)
R(-7,-58)


3.3.a. Harvesters

Sato

B(8,64)
R(2,-67)


3.3.b. Harvesters

Canny

B(7,64)
R(2,-67)


3.4.a. Icon

Sato

B(8,41)
R(-3,-45)


3.4.b. Icon

Canny

B(7,41)
R(-3,-45)


3.5.a. Lady

Sato

B(4,55)
R(-2,-64)


3.5.b. Lady

Canny

B(4,58)
R(-2,-64)


3.6.a. Melons

Sato

B(5,86)
R(-2,-102)


3.6.b. Melons

Canny

B(6,86)
R(-2,-101)


3.7.a. Onion Church

Sato

B(20,53)
R(-5,-55)


3.7.b. Onion Church

Canny

B(18,53)
R(-5,-55)


3.8.a. Self Portrait

Sato

B(26,82)
R(-4,-104)


3.8.b. Self Portrait

Canny

B(26,82)
R(-4,-104)


3.9.a. Three Generations

Sato

B(6,55)
R(0,-61)


3.9.b. Three Generations

Canny

B(7,58)
R(3,-61)


3.10.a. Train

Sato

B(3,44)
R(-24,-44)


3.10.b. Train

Canny

B(1,41)
R(-24,-44)


3.11.a. Village

Sato

B(5,67)
R(-5,-75)


3.11.b. Village

Canny

B(5,67)
R(-5,-75)


3.12.a. Workshop

Sato

B(1,53)
R(5,-53)


3.12.b. Workshop

Canny

B(0,53)
R(5,-53)








4. Autocontrast


As a final touch, the images may be post-processed to appear more photorealistic. By default, they are low contrast (dark spots are not very dark, and bright spots are not very bright) and are overexposed (light sources emit too much light). The most popular technique to address such problem is histogram equalization (that is, spreading the tones of each color channel over all discrete values between 0 and 1). However, after playing around with some of my own functions and skimage methods, I discovered that histogram equalization made the images look even more exposed and unnatural; it "overcompensated" for the problem.

The solution I eventually found is called sigmoid contrast adjustment. Namely, by transforming the image using a sigmoid function, rather than, say, the cumulative distribution, it becomes possible to auto-contrast the image and still preserve somewhat natural-looking light. Implementation-wise, I used the adjust_sigmoid() from skimage.exposure, with a cuttoff of 0.5, and gain of 6 (where any gain above 5 increases contrast, and anything below reduces it). The figures below demonstrate sigmoid adjustment in action. Notice how images on the left, where the sigmoid has been applied look natural, yet more vivid than the untouched images on the right.




4.1.a. Castle

Sigmoid


4.1.b. Castle

Untouched


4.2.a. Emir

Sigmoid


4.2.b. Emir

Untouched


4.3.a. Harvesters

Sigmoid


4.3.b. Harvesters

Untouched


4.4.a. Icon

Sigmoid


4.4.b. Icon

Untouched


4.5.a. Lady

Sigmoid


4.5.b. Lady

Untouched


4.6.a. Melons

Sigmoid


4.6.b. Melons

Untouched


4.7.a. Onion Church

Sigmoid


4.7.b. Onion Church

Untouched


4.8.a. Self Portrait

Sigmoid


4.8.b. Self Portrait

Untouched


4.9.a. Three Generations

Sigmoid


4.9.b. Three Generations

Untouched


4.10.a. Train

Sigmoid


4.10.b. Train

Untouched


4.11.a. Village

Sigmoid


4.11.b. Village

Untouched


4.12.a. Workshop

Sigmoid


4.12.b. Workshop

Untouched








5. Gallery




Biological Specimens


Double Poppies


Etruscan Vase


Biological Specimens

B(-37,44)
R(41,-68)


Double Poppies

B(4,3)
R(-3,-53)


Etruscan Vase

B(22,18)
R(-20,-94)


Floodgates


Gospel


Mausoleum


Floodgates

B(7,50)
R(-6,-61)


Gospel

B(-6,37)
R(6,-45)


Mausoleum

B(1,26)
R(0,-48)









CS 194-26: Computer Vision and Computational Photography (Fall 2020)

Apollo