In this assignment, I take the digitized Prokudin-Gorskii glass plate images and, using image processing techniques, automatically produce a color image with as few visual artifacts as possible. The three color channel images were extracted, aligned, and placed on top of each other to form a single RGB color image.
The end result: visually stunning color images.
Naturally, we need some kind of metric to measure how well two color channels align with each other. The two metrics I explored in this project are:
I wrote additional helper functions to perform chores such as cropping the images (to remove uninteresting borders) and applying the Sobel operator (to see if edge detection is useful).
In the very beginning, I tried exhaustively searching over a window of possible displacements using a window size of 15. I would compute the alignment score (using one of the aforementioned metrics) and keep track of the best alignment vector encountered so far. This was a naive implmentation in that although it worked well for small JPEG images (under 200 KB), it was computationally infeasible for the large TIFF images (around 70 MB) to finish.
A more efficient way to align the color channels is through the use of an image pyramid. An image pyramid is a hierarchical way of representing an image at different scales, where the top levels are smaller images that have been blurred and subsampled from the original.
"In a Gaussian pyramid, subsequent images are weighted down using a Gaussian average (Gaussian blur) and scaled down. Each pixel containing a local average corresponds to a neighborhood pixel on a lower level of the pyramid." --Wikipedia
As required by the project, I implemented an image pyramid without using existing high-level implementations. Specifically, I used recursion to reach the top level of the pyramid, where I would perform an exhaustive search on the image (which is a much smaller version of the original) and pass the alignment estimate one level down the pyramid.
This led to great runtime improvements. For a window size of 5, whereas the naive approach takes more than 2 minutes (and often much longer), the image pyramid can finish within 15 seconds.
Aligning some color channels was initially difficult, even when I set the window sizes to high values, such as 15 pixels. Through experimentation, I soon realized that the major culprit was the borders surrounding the images. Since the images were scanned, many contained blemishes and handwriting, effectively introducing noise as the images were aligned. Thus, my intuition is that removing borders more accurately will help better color channel alignment.
To this end, I implemented a default cropping function, as well as an automated version that searches for the largest connected region using Otsu's method. I used the default cropping function to roughly remove the borders and then employed auto-cropping to fine-tune the result.
Although the pipeline now works efficiently and achieves good results on most images, it struggles with three inputs: emir, melon, and sculpture. I investigated the images more closely and found (r, b) was aligned differently from (g, b). To solve this, I made one simple fix: to align the color channels to the green channel instead. This way we are computing the alignment vectors for (b, g) and (r, g).
This turned out to be successful. Not only did it solve the edge cases, but it also worked well on all other images. I think the reasoning behind this is if we start off with an incorrect distribution, then it will be very hard for the alignment to be successful.
Note: Offsets are listed as (y_offset, x_offset).
Cathedral.jpgOffsets for (g, b): [-5, -2] Offsets for (r, b): [7, 1] |
Church.tifOffsets for (g, b): [-25, -4] Offsets for (r, b): [33, -8] |
Emir.tifOffsets for (g, b): [-49, -24] Offsets for (r, b): [57, 17] |
Harvesters.tifOffsets for (g, b): [-60, -16] Offsets for (r, b): [65, -3] |
Icon.tifOffsets for (g, b): [-40, -17] Offsets for (r, b): [48, 5] |
Lady.tifOffsets for (g, b): [-53, -8] Offsets for (r, b): [63, 3] |
Melons.tifOffsets for (g, b): [-82, -9] Offsets for (r, b): [96, 3] |
Monastery.jpgOffsets for (g, b): [3, -2] Offsets for (r, b): [6, 1] |
Onion_Church.tifOffsets for (g, b): [-51, -26] Offsets for (r, b): [57, 10] |
Sculpture.tifOffsets for (g, b): [-33, 11] Offsets for (r, b): [107, -16] |
Self_Portrait.tifOffsets for (g, b): [-79, -29] Offsets for (r, b): [98, 8] |
Three_Generations.tifOffsets for (g, b): [-54, -11] Offsets for (r, b): [58, -1] |
Tobolsk.jpgOffsets for (g, b): [-3, -3] Offsets for (r, b): [4, 1] |
Train.tifOffsets for (g, b): [-43, -5] Offsets for (r, b): [43, 27] |
School.tifSchool in the village of Pidma named after His Imperial Majesty, Sovereign, Heir Apparent, Crown Prince, Grand Duke Aleksei Nikolaevich. [Russian Empire] Offsets for (g, b): [-26, -9] Offsets for (r, b): [36, 0] |
Monument.tifCity of Lodeinoe Pole. Monument to Emperor Peter the Great. [Russian Empire] Offsets for (g, b): [-22, -21] Offsets for (r, b): [36, 8] |
Study.tifOstrechiny. Study. [Russian Empire] Offsets for (g, b): [-13, 5] Offsets for (r, b): [120, -6] |
Group.tifGroup of children. [Russian Empire] Offsets for (g, b): [-66, -35] Offsets for (r, b): [77, 17] |
Sawmill.tifView of the sawmill. Kovzha. [Russian Empire] Offsets for (g, b): [-15, -22] Offsets for (r, b): [42, 15] |
Machine.tifStone-excavating machine of the multi-scoop type "Svirskaia pervaia." [Russian Empire] Offsets for (g, b): [-33, -2] Offsets for (r, b): [34, -21] |
I implemented an automatic cropping function that takes in the three color channels and returns cropped versions of them. The cropping is based on the thresholding of the red channel using Otsu's method to segment the image into significant regions, and then using the bounding boxes of the two largest regions to determine the cropping boundaries.
This mechanism is combined with the default cropping function to achieve optimal results. In the following example, notice how automatic cropping leads to more a more optimal alignment vector for (r, b).
BeforeOffsets for (g, b): [-5, -2] Offsets for (r, b): [0, 1] |
AfterOffsets for (g, b): [-5, -2] Offsets for (r, b): [7, 1] |
Before |
After |
I also implemented automatic contrasting to map the darkest pixel to zero and the brightest pixel to (on its brightest color channel). This serves as a gentle way to rescale image intensities and improve image quality. Overall, images appear more natural and pleasant.
Before |
After |
Before |
After |