International student participating in the Brazilian Scientific Mobility Program (BSMP) at University of California, Berkeley, for the 2015-2016 academic year. Enrolled as a Computer Science Extension student. Born and raised in Rio de Janeiro, Brazil. Undergrad student at Pontifical Catholic University of Rio de Janeiro, studying Computer Engineering and Mathematics. Traditional and Digital art lover and currently interested in Computer Graphics and Artificial Intelligence - but who knows what new field of study I might fall in love with! Trying to learn the most I can during this one year here in Berkeley.
" Sergei Mikhailovich Prokudin-Gorskii (1863-1944) [Сергей Михайлович Прокудин-Горский, to his Russian friends] was a man well ahead of his time. Convinced, as early as 1907, that color photography was the wave of the future, he won Tzar's special permission to travel across the vast Russian Empire and take color photographs of everything he saw including the only color portrait of Leo Tolstoy. And he really photographed everything: people, buildings, landscapes, railroads, bridges... thousands of color pictures! His idea was simple: record three exposures of every scene onto a glass plate using a red, a green, and a blue filter. Never mind that there was no way to print color photographs until much later -- he envisioned special projectors to be installed in "multimedia" classrooms all across Russia where the children would be able to learn about their vast country. Alas, his plans never materialized: he left Russia in 1918, right after the revolution, never to return again. Luckily, his RGB glass plate negatives, capturing the last years of the Russian Empire, survived and were purchased in 1948 by the Library of Congress. The LoC has recently digitized the negatives and made them available on-line." (You can find those images here)
" The goal of this assignment is to take the digitized Prokudin-Gorskii glass plate images and, using image processing techniques, automatically produce a color image with as few visual artifacts as possible. In order to do this, you will need to extract the three color channel images, place them on top of each other, and align them so that they form a single RGB color image. A cool explanation on how the Library of Congress created the color images on their site is available here. "
The first step for this processing is to crop the three images and handle them separately. For that, we assume that all the images occupy one third of the total height of the input file. However, if you just stack the three images together the result will not be good. The images are not properly aligned.
In order to align the images lets define a metric that will tell us, given two unaligned images I and It, how well aligned they are. The Sum of Squared Differences (SSD) will be the metric being used.
The lower the SSD of two images, the better aligned they are. In order to make the proper alignment, we need to find the lowest SSD for every possible offset. Once we align the channel R with G and B with G, we can just stack them together to form a aligned colored version of the input.
One simple solution for this problem is to search in a window of possible offsets and calculate the SSD of all of them. Then, we stick with the lowest one.
Using a window of [-15,15] for x offset and [-15,15] for y offset, it was possible to properly align some of the inputs. However, when the images were larger, it was necessary to search in a larger window in order for the program to find the correct alignment (the displacements were larger as well). Because of that, the computing cost increased to such a point that the program could not process everything in reasonable time.
One small consideration for this algorithm: before aligning, we should also crop the image (20% for every dimension) so the noisy borders do not have an impact on the metric.
For larger inputs, a better approach was needed. The larger the displacement of two images, the easier it must be to identify the misalignment in their lower resolution versions. Because of that, before trying to do some fine alignment in the full resolution versions, we should try to align bluried versions of the images first, which have a lower computational cost. For the alignment iself, we can use the same algorithm we used for the Single-Scale Image Alignment.
For that computation, lets build a Image Pyramid. This data structure consists of sequential copies of the same image, having each copy to be a low-resolution version of the last one. For that, I have used the skimage.transform.rescale function, which lowers the resolution of an image by using interpolation techniques, if given a scaling factor lower than one. For the scaling factor I have used a factor of 0.5, resulting in images of a quarter of the size of the previous ones.
For the aligment we should start with the higher levels (lower resolution). Imagine we find that the displacement of two level-N (N>0) images is (dx,dy). Because of the factor of 0.5 between the images, we can start our search for the lowest SSD in level-(N-1) with the initial displacement of (2dx,2dy). Every offset pixel in a upper level corresponds to double that offset amount in lower levels of the pyramid. The aligment in the level (N-1) will only be a fine-tuning of the alignment given by the level N. The sequence of consecutive fine tunings will result in a good alignment for the given images.
More specific pyramids with different kinds of building mechanisms can be used. However, the pyramid using the interpolation offered by skimage.transform.rescale was sufficient for the purposes of this project. Article about Gaussian and Laplacian Pyramids: http://web.mit.edu/persci/people/adelson/pub_pdfs/pyramid83.pdf
The results were very satisfying. Every single image has had a good alignment, considering the fact that we assumed that only translations would be enough to process the channels. I have added a fixed cropping (of 10%) for the final results to partially get rid of their unnatractive borders. The following series of images consist on both the unaligned and aligned versions of the input images.
The low resolution images were processed pretty fast (a few seconds), while the high resolution images took a little bit longer (one to two minutes), as expected.
|Image Name||Red Offset||Green Offset||Blue Offset|
|Image Name||Red Offset||Green Offset||Blue Offset|
Take a closer look at the picture below. It is possible to see that two of the girls have on top of them those "chromatic ghosts". At the same time that most of the image is well aligned, some specific parts have this effect. This suggests that only translating the channels by an offset is not enough to fully recover the original scenes.
Other techniques would be necessary to fix this problem. One idea to apply slightly different offsets for those parts of the image and then use some kind of merging algorithm to compose a full image. Another idea is to simply try to complete the misadjusted information with surrounding data.