The goal of this project is to reconstruct a colored image given "three exposures of every scene onto a glass plate using a red, a green, and a blue filter." To do this, we simply find the best alignment of green to blue and red to blue, finally stacking them together to create "a single RGB color image".
To find the best alignment, initially I used the SSD (Sum of Squared Differences) metric to caculate similarity between two images. However, SSD does not work very well in some cases, such as melons and especially emir, because the three images do not have the same brightness value. Thus, I decided to use SSIM (structural similarity index) to compare images. I calculated the SSIM values by calling the built-in function structural_similarity from skimage.metrics. The naive approach on coarse images, such as cathedral and monastery, was to predefine a certain range ([-15, 15]). I then shifted the to-be-aligned image (green/red) in both horizontal and vertical directions in this specified range, and in each shifted version, I calculated the SSIM value between this image and the blue image. Finally, the x and y offset that gives the maximum SSIM is returned.
To optimize this algorithm on finer inputs, I implemented image pyramids. I first defined the maxium number of pixels for the coarsest images that I would directly compare. Then I pass the best x and y offsets I find on this level to the next level (2x pixels each side). At each level, I find the best x and y offsets within 6 pixels from the twice the offsets from the last level (6 is found through trial and error). This recursion repeats until we reach the original image.
Moreover, I only compare the center 900 pixels if image size is greater than 10000 pixels, and for smaller images, I only compare the center 3/4 of the entire image. This is to make sure that we don't compare the borders, which may mislead alignment, and that comparing bigger images can retain a fast speed.
After all these steps, processing a .tif image takes about 10 to 20 seconds.
The images we produced from running the above-mentioned steps have colored borders due to alignments. To try to crop the images according to the border instead of a predefined border, I first tried using edge detection. I experimented with skimage.filters.sobel as well as skimage.features.canny to detect edges, and then iterated through every pixel of the edge map to find long edges (> 1/3 of width of height). However, this does not work very well because for skimage.filters.sobel, the pixels in the edge map are represented as floats in the range [0, 1] where the numbers indicate how white a pixel is. I could not come up with a satisfying threshold to classify a pixel as on or off edges. The second approach I attempted was to calculate the borders directly by doing arithmetic with the optimal offsets I obtained during the alignment process. This does not work either as the colors interfere with each other even beyond the shifted, non-overlapping parts.
After consulting with one of the TAs, I decided to directly compare each row and each column with their adjacent row and column, respectively. For every pair of rows/columns, I directly sum the differences between the sum of their pixels. Define the maximum pixel difference of the row/column within 15% from the image border as top, right, bottom, left, respectively. Since this row/column sometimes underestimate the width of the to-be-cropped border, I also multiply top, right, bottom, left with a constant k, and define this new distance as the width of the new border.
Picture | Aligned | Cropped |
---|---|---|
Cathedral G: [5, 2] R: [12, 3] |
||
Emir G: [49, 22] R: [106, 40] |
||
Harvesters G: [59, 17] R: [123, 14] |
||
Icon G: [41, 17] R: [90, 22] |
||
Lady G: [53, 8] R: [114, 11] |
||
Melons G: [89, 14] R: [184, 5] |
||
Monastery G: [-3, 1] R: [3, 2] |
||
Onion church G: [50, 24] R: [108, 36] |
||
Self portrait G: [78, 27] R: [175, 35] |
||
Three generations G: [53, 22] R: [108, 14] |
||
Tobolsk G: [3, 3] R: [7, 3] |
||
Train G: 43, 4] R: [86, 30] |
||
Village G: [65, 12] R: [138, 22] |
||
Workshop G: [53, -1] R: [105, -13] |