CS194-26 Project 1 - Edward Fang

Colorizng the Prokudin-Gorskii Photo Collection

Background

The Sergei Mikhailovich Prokudin-Gorskii Collection features color photographic surveys of the vast Russian Empire made between ca. 1905 and 1915. Frequent subjects among the 2,607 distinct images include people, religious architecture, historic sites, industry and agriculture, public works construction, scenes along water and railway transportation routes, and views of villages and cities. An active photographer and scientist, Prokudin-Gorskii (1863-1944) undertook most of his ambitious color documentary project from 1909 to 1915. The Library of Congress purchased the collection from the photographer's sons in 1948.

Taken from: http://www.loc.gov/pictures/collection/prok/

Project Objective

The objective of this project is to take the digitized Prokudin-Gorskii plates and develop a robust alignment algorithm capable of automatically aligning the B, G, R plates to reproduce a correctly colored image.

Single Scale Approach

My single scale approach to B, G, R alignment performed an exhaustive search over all possible offsets over a fixed interval of -15 pixels to +15 pixels in both the x and y directions. This involved shifting the B, R channels over all possible offsets and comparing the difference to the G channel. The comparison metric was SSD (Sum of Squared Differences) and the objective was to minimize the SSD between the different channels. When the offset that minimized the SSD was found, the algorithm shifted the B, R channels and stacked them with the G channel to produce the final colored image. This process took less than a minute for the smaller jpg files and correctly aligned the channels. This does not scale to the larger tif files as an exhaustive search is computationally unfeasible. This leads to the development of the multi scale algorithm.

Multi Scale Approach

My multi scale algorithm utilized image pyramids to speed up the searching process. The number of levels needed was calculated using a defined log function. At each level, the search window was adjusted based on the size of the image. Generally, the lowest level of the pyramid searched over approximately 50% of the low resolution image. As the algorithm traversed up the image pyramid, the search area decreased by about a factor of 8. My comparison metric was NCC (Normalized Cross Correlation). I found that this was good enough to estimate image alignment, but not good enough for pixel perfect matching. To over come this, I added two tuning layers, one coarse and one fine layer. The coarse layer searched over a moderate interval with a stride while the fine layer looked at a very small search area to fine tune the alignment. Last, my algorithm only considered pixels within the middle 40% of the image. This resulted in a massive speed up with minimal trade-offs in accuracy. The combination of image pyramids, fine tuning and border cropping allows my algorithm to perfectly (or almost) align images in around 30 seconds.

Note: My algorithm holds the G channel constant and aligns B, R. My reasoning is because G is in the center of the stack and thus the displacement for B, R should be relatively less when aligning with G. This allows for a smaller search window resulting in a faster algorithm.

Bells and Whistles

I implemented gradient features, automatic border cropping and automatic contrasting. I decided to use gradient features because the pixel intensities varied across channels, occasionally throwing off NCC. While my base algorithm was able to produce correctly colored images, I found that using gradient features resulted in slightly sharper alignments. The difference was a few pixels at most, but still noticeable to a keen eye. I calculated gradients by convolving the Scharr operator with the image. My automatic border cropping algorithm also utilized convolutions with the Scharr operator. After calculating the gradients of the image, my border cropping algorithm traversed the images from each side to the opposite side, looking for jumps in intensities. Edges in the gradient image were usually represented as low pixel intensities followed by a sharp increase in pixel intensity followed by a sharp decrease in pixel intensities. Jumps were measured up to a certain percentage of the image and the "best" jumps were taken as indices to begin cropping from. This process was repeated for all four sides and removed the borders. Overall, this worked decently but failed to remove all the artifacts. There was a tradeoff between removing too much of the image and not enough of the border artifacts; I chose to keep more of the artifacts. Automatic contrasting had very slight impact on the resulting image. Images were slightly sharper and brighter, but it took zooming in and comparing pixels to discern. I implemented autocontrasting by applying a linear transform to all the images. This linear transform was calculated by a * p5 + b = 0 and a * p95 + b = 1, essentially scaling the 5th percentile to 0 and the 95th percentile to 1.

Offsets using Multi Scale Algorithm

cathedral r [7 1] b [-5 -2]
emir r [58 18] b [-49 -24]
harvesters r [64 -3] b [-59 -17]
icon r [49 5] b [-41 -18]
lady r [60 5] b [-52 -8]
monastery r [6 1] b [ 3 -2]
nativity r [ 5 -1] b [-3 -1]
self_portrait r [93 5] b [-78 -29]
settlers r [ 8 -1] b [-7 0]
three_generations r [60 -4] b [-48 -15]
train r [43 26] b [-42 -6]
turkmen r [61 7] b [-57 -22]
village r [73 10] b [-64 -13]

Given Images without Bells and Whistles

Multi Scale Algorithm on Given Images

Given Images with Bells and Whistles

Multi Scale Algorithm with Auto Contrasting, Border Cropping, Gradient Features on Given Images

Note: The produced images are slightly sharper (when full size) and contain significantly less border artifacts.

Choice Images with Bells and Whistles

Multi Scale Algorithm with Auto Contrasting, Border Cropping, Gradient Features on Choice Images

Note: Even though the last image has severe damage to one of the glass plates, my algorithm was still able to reproduce a great alignment of the channels.