The goal of this project is to take the digitized Prokudin-Gorskii glass plate images and, using image processing techniques, automatically produce a color image with as few visual artifacts as possible. In order to do this, I extracted the three color channel images, placed them on top of each other, and aligned them so that they form a single RGB color image.
The main challenge in this project is to automatically align the three glass plates to produce a sharp colored image. To find out how to best align the rest two glass plates against the first one, the algorithm experiments with a bunch of displacements and pick the best displacement based on some heuristic. Some heuristics I implemented and used are l2-norm, NCC and the l2-norm of the difference of the binary edge estimations of two images. I implemented and used the l2-norm based on the edge estimation because some images have glass plates with different brightness and the two naive metrics cannot account very well for this situation. Edges detected, however, are invariant even with varied brightness and could serve as a better metric (but at a price of a much longer runtime). To avoid exhaustive search and to speed up the search in large inputs, I modified my search algorithm based on the image pyramid which finds the best displacement in some relatively smaller, scaled down versions of the original image and then find the best displacement in the original input based on the displacements found in the smaller images. Additionally, to make sure the alignment metric is not fooled by the peripheral region in each glass plate, I crop each glass plate so that only the central part of them are retained before fed into the search algorithm.
ag refers to the displacement of g plate to align with b plate
ar refers to the displacement of r plate to align with b plate
emir.tif: ag: [49, 23], ar: [107, 40]
train.tif: ag: [42, 5], ar: [87, 32]
harvesters.tif: ag: [59, 16], ar: [123,13]
icon.tif: ag: [41, 17], ar: [89,23]
lady.tif: ag: [51, 9], ar: [112,12]
monastery.gif: ag: [-3, 2], ar: [3,2]
onion_church.tif: ag: [51, 27], ar: [108,36]
self_portrait.tif: ag: [78, 29], ar: [176,37]
tobolsk.gif: ag: [3, 3], ar: [6,3]
workshop.tif: ag: [53, 0], ar: [105,-12]
three_generations.tif: ag: [53, 14], ar: [111,11]
castle.tif: ag: [34, 3], ar: [98,4]
cathedral.jpg: ag: [5, 2], ar: [12,3]
melons.tif: ag: [81, 10], ar: [178,13]
The naive two metrics (l2-norm and NCC) failed to aligh emir.tif because three glass plates of this image have different brightness. To align this image, I implemented a metric based on how different the binary edge maps of the two images are, as more detailed explained in Challenges and Approach section.
Since the naive metrics like l2-norm and NCC are very sensitive to changes in brightness and other variables, they are quite limited. Also, these metrics resemble less to how we as human align images. If to manually align images, we would probably align certain prominent parts that are present in both images. To mimic this idea, I implemented another metric using edge detection. This metric will measure how resemble the edges are in the two images we want to align. Smaller the difference in edges, better the alignment.