This project is focusing on different methods to align three color channels and colorize some historical photos. These photos were taken with red, green and blue filters separately on glass plate. Thanks to this insightful Russian artist, Sergei Mikhailovich Prokudin-Gorskii, I digitalized and colorized some of his masterpieces to view the life in Russian Empire. The input data is a combined black-and-white image of three channels in order of B, G, R. After manipulation, the output is a relatively aligned color photo. I explored different metrics for accurate alignment. Moreover, I also want the alignment to be efficient for high-resolution images.
Before utilizing any metrics to align three channels, I first read the combined three-channel image divide and simply stack the three channels. The effects are not ideal as we can see the faces are not properly aligned. The image thus looks very blurry. The original input data roughly has two types of size. The small images are in jpge format and a single channel has a dimension about 400px by 400px. The large are in tiff format and a single channel has about 4000px by 4000px! See the pictures Monastery (small), Harvesters (large) and , Emir (large):
p.s. though not accurate I enjoy some aesthetic of such naive alignment to add another flavor for documentary photos
Since the pixel value in each channel implies the brightness, to align three channels by fixing one channel and move around two other channels to match the brightness information should be expected to produce more accurate alignment. Therefore, we need to come up with estimation to quantify the "brightness matchness". I mainly implemented two metrics for the evaluation, SSD and NCC.
1. Sum of Squared Differences (SSD) DistanceSSD is L2 norm which computes the sum(sum((image1 - image2)^2)). The intuition here is to acquire SSD as low as possible so that image 1 and 2 are indeed very close to each other.
2. Normalized Cross-Correlation (NCC)NCC computes the dot product between two normalized vectors, namely, image1/||image1|| and image2/||image2||. When using NCC, we have to maximize the score because the closeness implied from a larger the product.
This might not to be general, but in this project, when other parameters maintain the same. I found that NCC and SSD often offer very similar displacements. I personally cannot tell the nuances between the output images. Examples below:
The fixed channel impacts the final output. In most cases, it is good enough to use blue channel as the base channel, move red and green channel to align the image. (i.e.) However, the blue channel fails to deliver the best alignment in photos like Monatery. Instead, making green channel to be fixed is better. See below
Comparison between SSD v.s. NCC
L: SSD, R:NCC
L: SSD, R:NCC
Comparison between different base color channel
L: Blue Fixed; R: Green Fixed
L: Blue Fixed; R: Green Fixed
L: Red fixed, M: Blue Fixed; R: Green Fixed
While the exhaustive search took seconds for small images, it is a headache to run this search on the large image. Therefore, an image pyramid structure can be really helpful. I used an image pyramid to recursively search on alignment in order to reduce search space when the image gets larger.
Displacement:
Cathedral:R: (-1, 7), G: (-1, 1) / Monastery: R:(1, 6), B:(0, 6)
Displacement:
Nativity: R:(1, 7), G: (1, 3) / Settlers: R:(-1, 14), G:(0, 7)
Displacement:
Harvester:R: (9, 112), G:(9, 56) / Lady: R:(1, 112), G:(1, 48) / Turkmen: R:(17, 112), G:9, 56)
Displacement:
Emir: G:(-16, -56), B:(-48, -96) / Train: R:(25, 88), G:(1, 40) / Icon: R:(17, 88), G:(9, 40)
Displacement:
Self Portrait: R: (1, 96), B:(-24, -87) / Three Generations: R:(1, 112), G:(9, 56) / Village: R: (1, 72), B: (-16, -64)
I picked several more photos from the archive to have them colorized. They portrait some blades, a cute vase and another beautiful church. If you are interested in this archive, please checkout here: Prokudin-Gorskii Collection from Library of Congress.
Displacement:
Blades: R:(49, 96), G:(25, 40) / Vase: R: (9, 80), G: (1, 8) / Church: R: (1, 64), G: (1, 16)
The first around try-outs did not output well-aligned colorized for some large images using NCC and image pyramid. Then I checked those photos and decided to use a threshold to cut off some pixels from the borders. I tried to manually cropped 200 px border from top, bottom, left and right before alignment algorithm. The visual outputs are much better. I believe that the noise of border significantly disturbs the metrics evaluation and thus did not give good alignment at first. Therefore, I definitely should implement automatic cropping for next steps in order to generalize the approach without manual work.
Another difficulty comes with photos Self-portrait and Village, after using different base color channel and metrics, the output seems still very blurry. Thus, I checked the code again, and noticed that the search space of bottom levels of the image pyramid is hard-coded into [-3, 3]. I turned it into another variable, and then tested out another windowsize [-1, 1] for bottom levels of the pyramid. Then the results are much better.